Closed wcarre closed 7 years ago
So is it possible to run gridss on multiple samples at once and get 1 vcf output ?
Joint calling of multiple samples is a common use case for GRIDSS and is most definitely supported.
Your errors look like problems with the command-line arguments passed to GRIDSS. The error message indicates that GRIDSS is treating the file WGS_GM_Fam2381HPE
as an INPUT
file which is a bit strange as there is no .bam extension on that file. It is also using WGS_GM_Fam2381HPE
as the output file which is definitely problematic as the output VCF should a) have a .vcf or .bcf extension, and b) not be the same file as the input bam.
I recommend looking at the first few lines of the GRIDSS log file and checking the input/output/assembly file names make sense. Conveniently, the start of the log file echos back the parameters passed into GRIDSS so it is relatively straight-forward to check. GRIDSS requires the input, output, and assembly files to all be different files^.
^ edge case: GRIDSS does not support multiple input files in different directories with the same filename if WORKING_DIR is explicitly set as the input.bam.gridss.working directories will clash (WORKING_DIR=null or leaving it unspecified is fine as the default location is the same directory as the input file)
Thanks for the return. I’ll check on these things.
Wilfrid
On 12 Sep 2017, at 18:22, Daniel Cameron notifications@github.com wrote:
So is it possible to run gridss on multiple samples at once and get 1 vcf output ?
Joint calling of multiple samples is a common use case for GRIDSS and is most definitely supported.
Your errors look like problems with the command-line arguments passed to GRIDSS. The error message indicates that GRIDSS is treating the file WGS_GM_Fam2381HPE as an INPUT file which is a bit strange as there is no .bam extension on that file. It is also using WGS_GM_Fam2381HPE as the output file which is definitely problematic as the output VCF should a) have a .vcf or .bcf extension, and b) not be the same file as the input bam.
I recommend looking at the first few lines of the GRIDSS log file and checking the input/output/assembly file names make sense. Conveniently, the start of the log file echos back the parameters passed into GRIDSS so it is relatively straight-forward to check. GRIDSS requires the input, output, and assembly files to all be different files^.
^ edge case: GRIDSS does not support multiple input files in different directories with the same filename if WORKING_DIR is explicitly set as the input.bam.gridss.working directories will clash (WORKING_DIR=null or leaving it unspecified is fine as the default location is the same directory as the input file)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/gridss/issues/92#issuecomment-328905729, or mute the thread https://github.com/notifications/unsubscribe-auth/AVDJlfnSCIE6qG1SGITUqbDd6Sn-Lj4kks5shq-rgaJpZM4PUqhw.
HI thanks for gridss. I may have not understood everything. I try to do a Trio analysis (3 bam files from Father, Mother, Child). So I run gridss with this parameters based on your example: java -ea -Xmx31g \ -Dsamjdk.create_index=true \ -Dsamjdk.use_async_io_read_samtools=true \ -Dsamjdk.use_async_io_write_samtools=true \ -Dsamjdk.use_async_io_write_tribble=true \ -cp $GRIDSS_JAR gridss.CallVariants \ TMP_DIR=. \ WORKING_DIR=. \ REFERENCE_SEQUENCE="$REFERENCE" \ INPUT="$INPUT1" \ INPUT="$INPUT2" \ INPUT="$INPUT3" \ OUTPUT="$OUTPUT" \ ASSEMBLY="$ASSEMBLY" \ BLACKLIST="$BLACKLIST" \ 2>&1 | tee -a gridss.$HOSTNAME.$$.log
I got an error message at the end : Runtime.totalMemory()=16895705088 INFO 2017-09-12 11:58:12 SAMFileUtil Not sorting as output already exists: ./WGS_GM_Fam2381HPE.gridss.working/WGS_GM_Fam2381HPE.sv.bam ERROR 2017-09-12 11:58:12 CallVariants Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. ERROR 2017-09-12 11:58:12 MultipleSamFileCommandLineProgram
java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.CallVariants.callVariants(CallVariants.java:96) at gridss.CallVariants.doWork(CallVariants.java:130) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:216) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at gridss.CallVariants.main(CallVariants.java:107) [Tue Sep 12 11:58:12 CEST 2017] gridss.CallVariants done. Elapsed time: 46.55 minutes. Runtime.totalMemory()=16895705088 Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:219) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at gridss.CallVariants.main(CallVariants.java:107) Caused by: java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.CallVariants.callVariants(CallVariants.java:96) at gridss.CallVariants.doWork(CallVariants.java:130) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:216) ... 2 more
So is it possible to run gridss on multiple samples at once and get 1 vcf output ?
Thanks