genomic-medicine-sweden / nallo

An analysis pipeline for long-reads from both PacBio and Oxford Nanopore Technologies (ONT), written in Nextflow.
https://genomic-medicine-sweden.github.io/nallo/
MIT License
19 stars 4 forks source link

Replace GLNexus #398

Open fellen31 opened 1 month ago

fellen31 commented 1 month ago

Perhaps GLNexus should be replaced. No work has been done the last two years, with open issues like assigning 0/0 when it should be ./. (https://github.com/dnanexus-rnd/GLnexus/issues/286), which might have implications for genmod.

fellen31 commented 1 month ago

For example: A variant is falsely assigned 0/0 with DP=0 by GLNexus, that should really be a ./. call. Genmod treats this call as genotyped since it's not ./.:

https://github.com/Clinical-Genomics/genmod/blob/d8090a2355884cf55d6974df4d6aa9ddc6ccc876/genmod/vcf_tools/genotype.py#L80-L81

XR and XD patterns are then excluded, when in reality they should not be:

https://github.com/Clinical-Genomics/genmod/blob/d8090a2355884cf55d6974df4d6aa9ddc6ccc876/genmod/annotate_models/models/x_models.py#L34-L36

So a ./. call should result in

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  father  mother  proband
X       302253  .       CCCTCCTGCCCCT   C       100     PASS    MQ=1;Annotation=GTPBP6,PLCXD1;GeneticModels=1:XR|XD;ModelScore=1:55     GT:AD:GQ        0/0:10,10:60    0/1:10,10:60 ./.:10,10:60

but a 0/0 call results in

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  father  mother  proband
X       302253  .       CCCTCCTGCCCCT   C       100     PASS    MQ=1;Annotation=GTPBP6,PLCXD1   GT:AD:GQ        0/0:10,10:60    0/1:10,10:60    0/0:10,10:60

This might also result in a variant being set as de novo, if the parents are falsely genotyped as 0/0:

https://github.com/Clinical-Genomics/genmod/blob/d8090a2355884cf55d6974df4d6aa9ddc6ccc876/genmod/annotate_models/genetic_models.py#L371-L375

parents 0/0:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  father  mother  proband
X       302253  .       CCCTCCTGCCCCT   C       100     PASS    MQ=1;Annotation=GTPBP6,PLCXD1;GeneticModels=1:XR_dn|XD_dn;ModelScore=1:57       GT:AD:GQ:DP     0/0:10,10:60:60      0/0:10,10:60:60 0/1:0,0:0:0

parents ./.:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  father  mother  proband
X       302253  .       CCCTCCTGCCCCT   C       100     PASS    MQ=1;Annotation=GTPBP6,PLCXD1;GeneticModels=1:XR|XR_dn|XD|XD_dn;ModelScore=1:57 GT:AD:GQ:DP     ./.:10,10:60:60      ./.:10,10:60:60 0/1:0,0:0:0

Not sure how often this would happen, and what the implications really are (if any), ping @jemten @dnil