CPTR-ReSeqTB / UVP

Mycobacterium tuberculosis next generation sequence analysis
MIT License
21 stars 12 forks source link

Switching to unencumbered GATK 4.x #28

Open tseemann opened 4 years ago

tseemann commented 4 years ago

Will this ever support GATK 4.x ?

mezewudo commented 4 years ago

At some point in a future revision possibly.

On Thu, Aug 29, 2019 at 4:48 AM Torsten Seemann notifications@github.com wrote:

Will this ever support GATK 4.x ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CPTR-ReSeqTB/UVP/issues/28?email_source=notifications&email_token=ABJNQ3ZC7PMZHPDDUAXDC43QG6EM5A5CNFSM4IR7RYEKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HIEKXIQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJNQ33FFNZAIUDC2RDWEADQG6EM5ANCNFSM4IR7RYEA .

dfornika commented 4 years ago

I've taken a quick look at this. It would simplify the installation process a bit. It seems like the main difference between GATK3 and GATK4 is the move from the UnifiedGenotyper to the HaplotypeCaller for SNP & indel calling. Some discussion here:

https://gatkforums.broadinstitute.org/gatk/discussion/3151/should-i-use-unifiedgenotyper-or-haplotypecaller-to-call-variants-on-my-data

I've had some difficulty finding detailed documentation or 'best practices' guidelines for using the HaplotypeCaller on monoploid organisms. There is a -ploidy switch, but it seems that the HaplotypeCaller is designed with human/cancer analysis in mind.

Do either of you (or anyone else reading this) have experience using the HaplotypeCaller on bacterial samples? Know of any good documentation?

I suppose that another significant part of adopting a new tool or version for this pipeline is re-running a validation process to ensure that the results are consistent/compatible with older pipeline version results.

tseemann commented 4 years ago

I have no experience with GATK 4 but yes it could be troublesome.

Does your design allow the variant caller to be a plugin? And replaced with alternate callers? eg. Snippy ie. REF + R1 + R2 . => [variant caller module ] => VCF (in your format)

dfornika commented 4 years ago

You can choose between GATK and samtools:

usage: uvp -q STRING -r STRING -n STRING [-q2 STRING] [-o STRING]
           [--keepfiles] [--bwa] [--all] [--gatk] [--samtools] [-a]
           [-t THREADS] [-k STRING] [-c STRING] [-v] [-h] [--version]

UVP - Call SNPs and InDels

Input:

  -q STRING, --fastq STRING
                        Input FASTQ file
  -r STRING, --reference STRING
                        Reference genome in FASTA format.
  -n STRING, --name STRING
                        Sample name to be used as a prefix.
  -q2 STRING, --fastq2 STRING
                        Second paired-end FASTQ file.

Output:

  -o STRING, --outdir STRING
                        Output directory
  --keepfiles           Keep intermediate files.

Aligners:
  Select a specific aligner.

  --bwa                 Align Illumina reads using bwa. (Default)

Callers:
  Choose program(s) to call SNPs/InDels with.

  --all                 Run all SNP / InDel calling programs.
  --gatk                Run GATK SNP / InDel calling. (Default)
  --samtools            Run SamTools SNP / InDel calling.

Annotation:
  Use snpEff to annotate VCF file

  -a, --annotate        Run snpEff functional annotation.

Optional:

  -t THREADS, --threads THREADS
                        Num CPU threads for parallel execution
  -k STRING, --krakendb STRING
                        Path to kraken database
  -c STRING, --config STRING
                        Config file
  -v, --verbose         Produce status updates of the run.
  -h, --help            Show this help message and exit
  --version             Show program's version number and exit
tseemann commented 4 years ago

That doesn't answer my question though. Can one replace the SNP calling engine and still get the rest of the functionality? Would be awesome for benchmarking on such an important problem in Mtb genomics.