PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
255 stars 71 forks source link

Trio analysis #92

Closed wcarre closed 7 years ago

wcarre commented 7 years ago

HI thanks for gridss. I may have not understood everything. I try to do a Trio analysis (3 bam files from Father, Mother, Child). So I run gridss with this parameters based on your example: java -ea -Xmx31g \ -Dsamjdk.create_index=true \ -Dsamjdk.use_async_io_read_samtools=true \ -Dsamjdk.use_async_io_write_samtools=true \ -Dsamjdk.use_async_io_write_tribble=true \ -cp $GRIDSS_JAR gridss.CallVariants \ TMP_DIR=. \ WORKING_DIR=. \ REFERENCE_SEQUENCE="$REFERENCE" \ INPUT="$INPUT1" \ INPUT="$INPUT2" \ INPUT="$INPUT3" \ OUTPUT="$OUTPUT" \ ASSEMBLY="$ASSEMBLY" \ BLACKLIST="$BLACKLIST" \ 2>&1 | tee -a gridss.$HOSTNAME.$$.log

I got an error message at the end : Runtime.totalMemory()=16895705088 INFO 2017-09-12 11:58:12 SAMFileUtil Not sorting as output already exists: ./WGS_GM_Fam2381HPE.gridss.working/WGS_GM_Fam2381HPE.sv.bam ERROR 2017-09-12 11:58:12 CallVariants Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. ERROR 2017-09-12 11:58:12 MultipleSamFileCommandLineProgram
java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.CallVariants.callVariants(CallVariants.java:96) at gridss.CallVariants.doWork(CallVariants.java:130) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:216) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at gridss.CallVariants.main(CallVariants.java:107) [Tue Sep 12 11:58:12 CEST 2017] gridss.CallVariants done. Elapsed time: 46.55 minutes. Runtime.totalMemory()=16895705088 Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:219) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at gridss.CallVariants.main(CallVariants.java:107) Caused by: java.io.IOException: Error writing variant calls to /ngs/datagen/genetics/tmp/gridss-master/example/WGS_GM_Fam2381HPE. File already exists. Please delete OUTPUT file. at gridss.CallVariants.callVariants(CallVariants.java:96) at gridss.CallVariants.doWork(CallVariants.java:130) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:216) ... 2 more

So is it possible to run gridss on multiple samples at once and get 1 vcf output ?

Thanks

d-cameron commented 7 years ago

So is it possible to run gridss on multiple samples at once and get 1 vcf output ?

Joint calling of multiple samples is a common use case for GRIDSS and is most definitely supported.

Your errors look like problems with the command-line arguments passed to GRIDSS. The error message indicates that GRIDSS is treating the file WGS_GM_Fam2381HPE as an INPUT file which is a bit strange as there is no .bam extension on that file. It is also using WGS_GM_Fam2381HPE as the output file which is definitely problematic as the output VCF should a) have a .vcf or .bcf extension, and b) not be the same file as the input bam.

I recommend looking at the first few lines of the GRIDSS log file and checking the input/output/assembly file names make sense. Conveniently, the start of the log file echos back the parameters passed into GRIDSS so it is relatively straight-forward to check. GRIDSS requires the input, output, and assembly files to all be different files^.

^ edge case: GRIDSS does not support multiple input files in different directories with the same filename if WORKING_DIR is explicitly set as the input.bam.gridss.working directories will clash (WORKING_DIR=null or leaving it unspecified is fine as the default location is the same directory as the input file)

wcarre commented 7 years ago

Thanks for the return. I’ll check on these things.

Wilfrid

On 12 Sep 2017, at 18:22, Daniel Cameron notifications@github.com wrote:

So is it possible to run gridss on multiple samples at once and get 1 vcf output ?

Joint calling of multiple samples is a common use case for GRIDSS and is most definitely supported.

Your errors look like problems with the command-line arguments passed to GRIDSS. The error message indicates that GRIDSS is treating the file WGS_GM_Fam2381HPE as an INPUT file which is a bit strange as there is no .bam extension on that file. It is also using WGS_GM_Fam2381HPE as the output file which is definitely problematic as the output VCF should a) have a .vcf or .bcf extension, and b) not be the same file as the input bam.

I recommend looking at the first few lines of the GRIDSS log file and checking the input/output/assembly file names make sense. Conveniently, the start of the log file echos back the parameters passed into GRIDSS so it is relatively straight-forward to check. GRIDSS requires the input, output, and assembly files to all be different files^.

^ edge case: GRIDSS does not support multiple input files in different directories with the same filename if WORKING_DIR is explicitly set as the input.bam.gridss.working directories will clash (WORKING_DIR=null or leaving it unspecified is fine as the default location is the same directory as the input file)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/gridss/issues/92#issuecomment-328905729, or mute the thread https://github.com/notifications/unsubscribe-auth/AVDJlfnSCIE6qG1SGITUqbDd6Sn-Lj4kks5shq-rgaJpZM4PUqhw.