andyrimmer / Platypus

Platypus Variant Caller
GNU General Public License v3.0
105 stars 38 forks source link

>1bp SNPs? #42

Open ulah opened 9 years ago

ulah commented 9 years ago

Hi, I have an issue with the output from platypus which I think is problematic. When running platypus with default settings on my data, I get quite some lines of the following format:

chr3 71561603 . TATTACCTTA AATTACCTTG 665 PASS someInfo chr3 156248949 . TT CC 1654 PASS someInfo

Looking at the alignment (and in the output of three other variant callers), there are well supported variants at these positions, but the way platypus is printing them is simply wrong. It should rather be:

chr3 71561603 . T A 665 PASS someInfo chr3 71561612 . A G 665 PASS someInfo chr3 156248949 . T C 1654 PASS someInfo chr3 156248950 . T C 1654 PASS someInfo

Is there a way to fix this using different parameters? Best, Urs

dancooke commented 9 years ago

This is not incorrect behaviour. Notice Platypus has called the same base mutations as the other callers you mention, the difference is Platypus calls these as MNPs rather than SNPs. It is not easy to infer if mutations occurred independently or not, in this example I would assume there is no evidence that this is the case (e.g. only one sample, or both present together in all samples). The mutations are spatially close, so calling MNPs here does not seem unreasonable.

ulah commented 9 years ago

Oh, sry, I've actually never heard of something like a MNP before... Thanks a lot for the clarification :) But is the first example I show really a MNP? Shouldn't then all SNPs within a certain distance be called MNPs? Although I see such MNPs in dbsnp like rs71273311 or rs71301114, non of them seems validated. Publications on these things are also almost non-existent.

I agree that there is no evidence of independent occurrence, but there is also no evidence for the opposite. And what about annotation of such variants - are MNPs deposited in cosmic or tcga or even in the 1000 genome project? It will of course be possible to predict the structural influence but one should get the same result, if they are called as individual SNPs.