Open magicDGS opened 7 years ago
Can you provide me some test data for include in the tools integration test, @vdauwera and/or @sooheelee? If not, I will try to use some BAM files already in the repository...
It's not clear to me that we want these tools in Gatk4. We deliberately didn't port them because we felt they were unnecessary going forward.
I understand that there are some legitimate use cases that require them: ex low coverage naive variant calling from high ploidy pools which haplotype caller would do poorly on. (Also, do we know that haplotype caller doesn't do well on those sorts of things? Maybe we should consider modifications there if it doesn't?) I'm not sure that supporting that use case is worth the added complexity of maintaining and supporting these tools. Especially since we don't provide a pileup based variant caller as part of gatk4...
@vdauwera Can you comment?
@sooheelee I'm not sure I agree with you that supporting this for mutect 1 is useful.
A) We don't want to support the use of mutect 1 anymore and would like to encourage people to switch to mutect 2 which I think we now believe is a better variant caller for both snps and indels.
B) Mutect 1 users are already using gatk3, so they have access to these tools already. Mutect 1 also requires co-cleaning which I believe is a different but related tool to indel realignment.
For the variant review issue, we have thoughts on implementing a much better solution for variant review by creating an assembly plugin for igv.
I thought after the last comment of @vdauwera in the blog post about the removal that it will be possible to port it here as a contribution (without too much effort from your dev team). If the final solution is that this is not going to be maintained in GATK4, I would port this code to my own software if you give me the permission; but it is definitely something that the community is interested.
For example, I'm working with Pool-Seq data with hundreds of individuals together, so HaplotypeCaller
is not a possibility in our case. I'm actually evaluating other approaches for realignment, such as ABRA or SRMA. I'm even thinking on implementing a new realigner based on the GATK's assembler engine and its PairHMM; but this requires more time for evaluation, and it will be nice to be able to compare with the current indel realignment pipeline. Anyway, I can close the issues and PRs in the gatk repo, and port them to my toolkit (ReadTools), to maintain the code for the community.
I asked around here and it seems like people think that it would be useful to have. If you're willing to do the work of porting we'll incorporate it. 👍
@magicDGS I provide some example data for a tutorial at https://gatkforums.broadinstitute.org/gatk/discussion/7156/howto-perform-local-realignment-around-indels. Search the page for tutorial_7156.tar.gz
. I showcase illustrative sites within the tutorial and also in https://software.broadinstitute.org/gatk/blog?id=7847.
I'm actually new to test data so what cases are you hoping to test with the data? The snippet in the tutorial data is much larger than you need so it would be good narrow down the test case.
As I understand, there are two ways: 1) Update all guides that include tools not ported to GATK4 so users could use GATK4 to get the results as they did earlier. 2) Add all tools from GATK3.6 to GATK4
Otherwise non official forks will appear..
For now could you please add all tools to GATK4?
That example data from the tutorial is good @sooheelee, but maybe it could be reduced in size to avoid adding it to the large file directory? It will be nice to include that example in the RealignerTargetCreator
PR (#3112)...
@sooheelee, I was coming back to the port this week and I found run the tutorial that you provide me, and the port of RealignerTargetCreator
(#3112) is providing the same result. Nevertheless, I cannot add the test data to the resources because the reference used is huge (3GB). The 7156_snippet.bam is of a good size to include it, but it requires the whole reference because some pairs are mapped in other chromosomes. Can it be possible to get another example that it is limited to a couple of chromosomes, preferably 20 and 21 because a reference is already provided for that chromosomes? Thanks in advance!
In addition, I realized that the links to the data are broken in the tutorial; hopefully I downloaded it time ago, but it will be nice if they can be retrieved again in case I lose them.
Any update on this issue? What is the recommended way to use the RealignerTargetCreator and IndelRealigner in other non-GATK pipelines?
No update, sorry. The PRs are pending of review, and the data is still not available for proper testing...
@lbergelson havent seen it mentioned but the biggest issue (for us at least) is that of licensing. GATK 4 is free for commercial use, while GATK 3 is not. Some of our non-commercial pipelines rely on these GATK 3 tools for processing data for use cases beyond GATK variant callers. Not having them available in GATK 4 means that these pipelines are difficult to move to a commercial setting. If the goal is to move everyone to GATK 4, then dropping support for these tools is counter productive. I am eagerly awaiting updates on their availability in GATK 4.
@stevekm I agree it would be beneficial to have the indel realignment tools in GATK 4. It helps with reproducing results from existing pipelines and resolves any licensing issues.
Having said that, you may want to have a look at the GATK 3 source code. RealignerTargetCreator and IndelRealigner are both in the public subfolder of the gatk-protected repo.
I 'm not a legal expert, but the source code for RealignerTargetCreator and IndelRealigner both contain this comment which looks to me like permission to use in a commercial setting:
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
Hi @magicDGS, thanks for tackling this. I would also like to be able to use IndelRealigner with GATK4. Where are you at with the porting so far? The last update to the PR is sooheelee providing you with some test data in March.
Sorry for all the interested people, but I had lately some deadlines unrelated with software development that took most of my time. Now I will have time to come back to other projects, and I would implement the port and tests with @sooheelee data this/next week. I hope that it works for you.
@magicDGS, thanks! We'll be waiting!
Sorry, I had several personal appointments and stuff to do the last weeks. I will inform you as soon as I can come back to the work on IndelRealignment
Is there any update on this? From what I understand, most non-GATK variant callers (such as bcftools or platypus) could still benefit from this.
Additionally, the documentation for htslib still references GATK's IndelRealigner. If there's no replacement forthcoming, I will open an issue of htslib to have this updated.
Any updates?
I'm not aware of any activity on this, unless @magicDGS is still pursuing it.
There haven't been any comments here for about 3 years. Have there been any updates in a separate thread or offline? Is there any hope there may be any eventually?
+1
Has there been any development on porting RealignerTargetCreator and IndelRealigner? These tools may be helpful for low coverage variant calls.
After discussion in #3084, I offer myself to port the indel realignment pipeline. After exploring the GATK3 implementation, I will split the port in the following independent tasks:
RealignerTargetCreator
(require test data after run with GATK3)ConstrainedMateFixingManager
NWaySAMFileWriter
(requires some change in the engine to get the ID for the inputs)The previous port will be integrated in the
IndelRealigner
tool implementation.