BD2KGenomics / protect

Apache License 2.0
28 stars 13 forks source link

Add vardict support #34

Open arkal opened 8 years ago

arkal commented 8 years ago

child of #3

arkal commented 8 years ago

Putting this on hold indefinitely since this tool takes forever to run.

schelhorn commented 6 years ago

Vardict is supposed to be parallelized within-sample by specifying to each instance a subset of genomic regions. Each vardict instance is assigned just one core. Also, the vardict-java implemention should be used, which also is the best supported version. See bcbio-nextgen for a scalable implementation in Python. We are using vardict-java in production and it is superior to the competition in many ways, especially on cancer exomes and panels. Also, it supports calling variations directly from RNA-Seq data, which may have relevance for your tool as well (if I'm not mistaken, Openvax Epidisco uses RNA-Seq variants to see if somatic neo-epitopes are actually expressed in the tumor as an additional step in filtering).

arkal commented 6 years ago

Interesting, thanks for the tip. I'll revisit this at the earliest.

schelhorn commented 6 years ago

Great. Since you're CWL-based, as is bcbio-nextgen, there even may be opportunities for you to integrate its scalable vardict functionality including the optional panel and RNA-Seq modes, (or vice-versa allowing protect integration in bcbio) with limited effort.

We'd certainly be interested in having neo-antigen calling in bcbio, and so may be others, right @mjafin?

mjafin commented 6 years ago

Absolutely, and thanks for clarifying VarDict best practice use Sven-Eric.

As an aside how does this tool differentiate from the likes of NetMHC and MHCFlurry?

arkal commented 6 years ago

Hi @mjafin. Thanks for your interest in ProTECT.

ProTECT is a fully automated workflow to predict neoantigens from input Fastqs, or a combinations of vcfs, bams, haplotypes, etc. It uses the IEDB suite of tools (that encompasses NetMHC) during the pMHC prediction step.

It differs from NetMHC/MHCFlurry in that those tools are pMHC prediction tools that accept a haplotype and peptides, and provide an estimation of binding energy for each combination. ProTECT accepts sequencing data from the patient and tries to provide a immunologically relevant ranked list of neoepitopes in the patient that can guide an ACT or peptide vaccine therapy.

mjafin commented 6 years ago

Thanks for the detailed explanation @arkal . Is there any chance you could support MHCflurry (or any other open source tool) in addition to NetMHC?

schelhorn commented 6 years ago

I'd be interested in this as well since MHCflurry seemed to perform well in recent validations, and is free for commercial use afaik.

The Hammer Lab (authors of MHCflurry) also have their own neo-antigen pipeline, epidisco, so they generally know what they are doing.

arkal commented 6 years ago

I thought i already had a ticket for that. Yes, I do want to allow and option for MHC Flurry (now #249) and I also want to look into Deep MHC (#247) as well since the Preprint shows promise.