Closed lbeltrame closed 10 years ago
I see where you are coming from, but I am a bit hesitant to go deeper down the road of supporting each variant caller's use of VCF. That said, the is_somatic column is intended to capture the variant caller's prediction and there is no standard (that I know of) for representing this in the INFO column. Would you be interested in adding this functionality and providing a pull request?
Likely yes, that's why I asked. ;) Where should I look for to see how gemini does somatic parsing?
Absolutely. The logic is here in the infotag.py file.
https://github.com/arq5x/gemini/blob/master/gemini/infotag.py#L65-L69
Most somatic variant callers, including MuTect and VarScan, follow the TCGA/ICGC standard of inserting a tag named SOMATIC
in the VCF's INFO column. Gemini's infotag.py appears to be handling this correctly, so this issue can be closed.
Most somatic variant callers, including MuTect and VarScan, follow the
VarScan has some issues in flagging the content correctly (I get also records with somatic status != 2), however indeed this can be closed as I'll handle this at the point of data generation (inserting SOMATIC flags where necessary).
Thanks for the update. Closing.
Unlike MuTect or other software, VarScan (when doing somatic mutation calling) adds a
SS
(Somatic Status) field to the VCFs it produces, using this code:To make matters complicated, it's a number, but the type in the INFO field is "String".
However in gemini all variants are flagged with
is_somatic
=1, potentially causing false positives. Ideally, gemini should only tag withis_somatic
only variants with SS=2. I understand this needs to be fixed at the VarScan level, but given the pace of development I'm doubtful it will happen any time soon.