Open gremame opened 1 year ago
Hi David,
I think this is just HaplotypeCaller internally splitting up the interval list so each thread can run on separate targets, and then it combines the results afterwards. Maybe the issue is that it's not running as multithreaded here... I can look further into that if that's the case.
best,
~brian
On Sun, Nov 27, 2022 at 12:48 PM David @.***> wrote:
Hello Brian, I tried passing the --intervals parameter from the GATK HaplotypeCaller to the --HC_xtra_args parameter that ctat-mutations provided. I saw in the logs that this parameter is correctly being passed to HaplotypeCaller when calling the program:
gatk --java-options "-Xmx6000m" \ HaplotypeCaller \ -R output/sample/cromwell-executions/ctat_mutations/55901d74-65b6-426f-9636-69930c907f08/call-HaplotypeCallerInterval/shard-1/inputs/-1676360762/ref_genome.fa \ -I /output/sample/cromwell-executions/ctat_mutations/55901d74-65b6-426f-9636-69930c907f08/call-HaplotypeCallerInterval/shard-1/inputs/-1234849600/sample.bqsr.bam \ -O sample.vcf.gz \ -dont-use-soft-clipped-bases --stand-call-conf 20 --recover-dangling-heads true --intervals /input/file3/chr2-208247000-208249000.bed --max-mnp-distance 0 \ -L /output/sample/cromwell-executions/ctat_mutations/55901d74-65b6-426f-9636-69930c907f08/call-HaplotypeCallerInterval/shard-1/inputs/826508261/0001-scattered.interval_list
There it is:
--intervals /input/file3/chr2-208247000-208249000.bed`
However, it seems that my call to the interval parameter is being overridden by an additional (I guess internal) use of it:
-L /output/sample/cromwell-executions/ctat_mutations/55901d74-65b6-426f-9636-69930c907f08/call-HaplotypeCallerInterval/shard-1/inputs/826508261/0001-scattered.interval_list
I was wondering if this interval parameter could become a parameter in the ctat-mutations pipeline and be handled in such way that allows to limit the scope of the analysis and reduce runtime. Best regards! David
— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/118, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX2BMMTYGTWNFKN3PVLWKONFZANCNFSM6AAAAAASMT7Z3Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Hello Brian, I went through the logs again, I found that as a prior step to HaplotypeCaller the workflow is calling another GATK tool: SplitIntervals
gatk --java-options "-Xmx1500m" \
SplitIntervals \
-R /output/L19-5858/cromwell-executions/ctat_mutations/a29788ae-c656-45d2-859a-6c13c9b65ae1/call-SplitIntervals/inputs/-1676360762/ref_genome.fa \
-scatter 10 \
-O interval-files \
Fortunately, this tool also accepts the --intervals
parameter, according to their documentation. So it seems like the solution to this problem could be easily resolved (hopefully):
If intervals
becomes an input for the ctat-mutations pipeline this input can then be passed to SplitIntervals, without further modifications, the pipeline will take care of the rest, as the output from SplitIntervals is already being passed into HaplotypeCaller.
I believe that's all we needed to be able to limit the analysis to the regions defined in the intervals file.
What do you think?
Best regards,
David
Hi David,
I'll look into this shortly and get back to you.
many thanks,
~b
On Tue, Nov 29, 2022 at 8:53 AM David @.***> wrote:
Hello Brian, I went through the logs again, I found that as a prior step to HaplotypeCaller the workflow is calling another GATK tool: SplitIntervals
gatk --java-options "-Xmx1500m" \ SplitIntervals \ -R /output/L19-5858/cromwell-executions/ctat_mutations/a29788ae-c656-45d2-859a-6c13c9b65ae1/call-SplitIntervals/inputs/-1676360762/ref_genome.fa \ -scatter 10 \ -O interval-files \
Fortunately, this tool also accepts the --intervals parameter, according to their documentation. So it seems like the solution to this problem could be easily resolved (hopefully): If intervals becomes an input for the ctat-mutations pipeline this input can then be passed to SplitIntervals, without further modifications, the pipeline will take care of the rest, as the output from SplitIntervals is already being passed into HaplotypeCaller. I believe that's all we needed to be able to limit the analysis to the regions defined in the intervals file. What do you think? Best regards, David
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas
Hi David,
I should have a version with the intervals option supported later today. Are you using ctat-mutations via docker or singularity, or through a native installation?
best,
~b
On Tue, Nov 29, 2022 at 10:31 AM Brian Haas @.***> wrote:
Hi David,
I'll look into this shortly and get back to you.
many thanks,
~b
On Tue, Nov 29, 2022 at 8:53 AM David @.***> wrote:
Hello Brian, I went through the logs again, I found that as a prior step to HaplotypeCaller the workflow is calling another GATK tool: SplitIntervals
gatk --java-options "-Xmx1500m" \ SplitIntervals \ -R /output/L19-5858/cromwell-executions/ctat_mutations/a29788ae-c656-45d2-859a-6c13c9b65ae1/call-SplitIntervals/inputs/-1676360762/ref_genome.fa \ -scatter 10 \ -O interval-files \
Fortunately, this tool also accepts the --intervals parameter, according to their documentation. So it seems like the solution to this problem could be easily resolved (hopefully): If intervals becomes an input for the ctat-mutations pipeline this input can then be passed to SplitIntervals, without further modifications, the pipeline will take care of the rest, as the output from SplitIntervals is already being passed into HaplotypeCaller. I believe that's all we needed to be able to limit the analysis to the regions defined in the intervals file. What do you think? Best regards, David
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
--
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas
Hello Brian, sounds great, many thanks. I'm using ctat-mutations via docker. Best regards, David
sounds good. I'll have an updated docker for you to try shortly.
best,
~b
On Wed, Nov 30, 2022 at 10:30 AM David @.***> wrote:
Hello Brian, sounds great, many thanks. I'm using ctat-mutations via docker. Best regards, David
— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/118#issuecomment-1332349623, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX3F3GOB4STZQDLGFNTWK5XJLANCNFSM6AAAAAASMT7Z3Q . You are receiving this because you commented.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Hi David - can you give this Docker a try?
trinityctat/ctat_mutations:3.3.0-predev
There'll be an --intervals parameter now that you can use to give your interval list for passing on to gatk.
best,
~brian
It worked! :smile: I ran the new version using an interval file I created and the FASTQ files provided as example in the documentation, I was happy to see that the only variants in the results were those that overlapped the BED file. I then tried the same with one of the samples I'm analyzing, in my first test I used FASTQ files and in the second I provided a BAM file. In both cases I only obtained variants overlapping the BED. Thanks a lot for adding this feature to the workflow!
Great! Thx for the update. This will go into the next release
On Thu, Dec 1, 2022 at 3:21 AM David @.***> wrote:
It worked! 😄 I ran the new version using an interval file I created and the FASTQ files provided as example in the documentation, I was happy to see that the only variants in the results were those that overlapped the BED file. I then tried the same with one of the samples I'm analyzing, in my first test I used FASTQ files and in the second I provided a BAM file. In both cases I only obtained variants overlapping the BED. Thanks a lot for adding this feature to the workflow!
— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/118#issuecomment-1333386102, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX52HND7FJW4UYO3C7LWLBN2JANCNFSM6AAAAAASMT7Z3Q . You are receiving this because you commented.Message ID: @.***>
--
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Hello Brian, I tried passing the
--intervals
parameter from the GATK HaplotypeCaller to the--HC_xtra_args
parameter that ctat-mutations provides. I saw in the logs that this parameter is correctly being passed to HaplotypeCaller when calling the program:There it is:
However, it seems that my call to the interval parameter is being overridden by an additional (I guess internal) use of it:
I was wondering if this interval parameter could become a parameter in the ctat-mutations pipeline and be handled in such way that allows to limit the scope of the analysis and reduce runtime. Best regards! David