biosinodx / SCcaller

Single Cell Caller (SCcaller) - Identify single nucleotide variations (SNVs) from single cell sequencing data
GNU Affero General Public License v3.0
34 stars 14 forks source link

PL field in VCF output #10

Open amkozlov opened 5 years ago

amkozlov commented 5 years ago

Hi guys,

currently, PL field in the VCF output of SCcaller is defined as follows:

##FORMAT=<ID=PL,Number=G,Type=Integer,Description="sequencing noise, amplification artifact, heterozygous SNV and homozygous SNV respectively">

whereas according to VCF4.3 spec it has a different meaning:

PL  (Integer):  The  phred-scaled  genotype  likelihoods  rounded  to  the  closest  integer,  and  otherwise  definedprecisely as the GL field
GL (Float):  Genotype likelihoods comprised of comma separated floating point log10-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields.

Could you please consider printing genotype likelihoods ('0/0', '0/1', '1/1') into 'PL' field as demanded by the spec? This would make it much easier to parse for downstream programs which rely on the VCF spec.

You could then use another, custom field to print additional info (sequencing noise etc.).

Thanks!

biosinodx commented 5 years ago

Hi Alexey,

Thanks a lot for the kind sugguestion. We will definitely consider it in our next update, which is coming soon.

Best wishes, Xiao

获取 Outlook for Androidhttps://aka.ms/ghei36

发件人: Alexey Kozlov 发送时间: 7月13日星期六 上午12:35 主题: [biosinodx/SCcaller] PL field in VCF output (#10) 收件人: biosinodx/SCcaller 抄送: Subscribed

Hi guys, currently, PL field in the VCF output of SCcaller is defined as follows:

FORMAT=

whereas according to VCF4.3 spechttp://samtools.github.io/hts-specs/VCFv4.3.pdf it has a different meaning: PL (Integer): The phred-scaled genotype likelihoods rounded to the closest integer, and otherwise definedprecisely as the GL field GL (Float): Genotype likelihoods comprised of comma separated floating point log10-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields. Could you please consider printing genotype likelihoods ('0/0', '0/1', '1/1') into 'PL' field as demanded by the spec? This would make it much easier to parse for downstream programs which rely on the VCF spec. You could then use another, custom field to print additional info (sequencing noise etc.). Thanks! ― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/biosinodx/SCcaller/issues/10?email_source=notifications&email_token=AEORBD5KDN6LXOZJOSS62RLP7CXGHA5CNFSM4ICNFGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G65Q5CA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEORBD3KT224NVRWWH52TCTP7CXGHANCNFSM4ICNFGUA.

amkozlov commented 5 years ago

great, thank you very much, Xiao!

In the meantime, is there an easy way to compute REF/REF, REF/ALT and ALT/ALT genotype likelihoods from the four values reported in PL field? Then we could implement a temporary workaround for our analysis.

Best, Alexey

biosinodx commented 5 years ago

Yes, the first two value are for REF/REF(i suggest simply to take the bigger number as the PL combined), third for REF/ALT, last for ALT/ALT.

Xiao

?? Outlook for Androidhttps://aka.ms/ghei36


From: Alexey Kozlov notifications@github.com Sent: Monday, July 15, 2019 6:57:34 AM To: biosinodx/SCcaller Cc: Xiao Dong; Comment Subject: Re: [biosinodx/SCcaller] PL field in VCF output (#10)

great, thank you very much, Xiao!

In the meantime, is there an easy way to compute REF/REF, REF/ALT and ALT/ALT genotype likelihoods from the four values reported in PL field? Then we could implement a temporary workaround for our analysis.

Best, Alexey

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/biosinodx/SCcaller/issues/10?email_source=notifications&email_token=AEORBD6L4VMLES5T4ZE3DILP7OVN5A5CNFSM4ICNFGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4O7SI#issuecomment-511242185, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEORBD5H6EFRNC6PHTQTCH3P7OVN5ANCNFSM4ICNFGUA.

amkozlov commented 5 years ago

perfect, thanks for your fast reply!

jmfa commented 4 years ago

Hi everyone, Small follow-up question: When you suggest to take the "the bigger number as the PL combined" of the first 2 values as the REF/REF likelihood, do you mean the largest value of the two or the one with the "the highest likelihood" (which in this case would be the smallest integer value)?

Thanks in advance J

amkozlov commented 4 years ago

So let's say we have

20,32,91,1546

should it be converted to a)

20,91,1546

or b)

32,91,1546

?

Thanks!