Closed sgomes-catarina closed 1 year ago
The subdirectory bcftools.irknRH
is created by bcftools sort
to store temporary sorted files. The message Merging 54 temporary files
means that bcftools sort
completed the sort and was in the processing of merging the temporary output but it must have then crashed before completing for some reasons. Could it be that you don't have space left on the filesystem to write the $out_prefix.bcf
file? Try to remove the last three lines and use bcftools sort -Ob -o $out_prefix.bcf -T ./bcftools.
to see what happens. Also, please don't use GRCh37
. You can realign the GDA-8v1-0_D1.csv
manifest file as explained here and then use the output in gtc2vcf with the option --sam-flank
Thank you for your reply @freeseek !
I do have free disk space to write the $out_prefix.bcf
file. I'm trying to reprocess it using more RAM memory to see what happens. I've realigned the csv manifest file but when running gtc2vcf should I use the GRCh37
or GRCh38
as reference genome?
Thank you again for your support!
It is useless to add more RAM. The BCFtools/gtc2vcf plugin requires ~3GB for the reference to be cached in memory, and some more memory to load into memory the BPM/CSV/EGT files (also notice BCFtools/gtc2vcf is only multi-threaded for compression/decompression, not for the GTC to VCF conversion itself). BCFtools/sort, which had progressed to the final step when it output:
Merging 54 temporary files
Requires by default a maximum of 768MB. Something else is causing the last step to fail. Maybe BCFtools/norm is failing. You might need to try by iteratively removing the last steps to see what is going on. What happens if you try the same step with one or two GTC files only? Does it complete?
If you are converting to VCF using only the --csv
option you need to use the reference of the CSV manifest file. If you are using both the --csv
and the --sam-flank
options, then you need to use the reference used for the aligned SAM/BAM file. If you are wondering what you should realign your manifest file against, then my advice is to always prefer GRCh38 over GRCh37. There should be no reasons left to use GRCh37 as of today
Hello,
I'm running the gtc2vcf plugin on 541 gtc files as below:
The final lines of my log file are:
The final BCF file is empty, but 54 individual bcf files were created in a subdirectory "bcftools.irknRH". Is the job terminating before expected due to lack of memory? I'm running it using 24GB in one node and 20 tasks per node.
Thank you in advance for your support.