CDCgov / datasets-sars-cov-2

Benchmark datasets for WGS analysis of SARS-CoV-2. (https://peerj.com/articles/13821/)
Apache License 2.0
54 stars 18 forks source link

dataset #6 lineage assignment validation #9

Closed chienchi closed 2 years ago

chienchi commented 2 years ago

Hi,

I downloaded the GISAID/NCBI genome based on accession numbers provided in the dataset#6 table and ran the pangolin v3.1.3 to get lineage assignment. All but one not matches with the expected lineage result (B.1.1.391) in the dataset#6.

Could you confirm the lineage of the hCoV-19_USA_CA-CZB-15265_2020 genome?

Here is the result I ran with the pangolin v3.1.3

taxon,lineage,conflict,ambiguity_score,scorpio_call,scorpio_support,scorpio_conflict,version,pangolin_version,pangoLEARN_version,pango_version,status,note

MW564975.1,B.1.1.450,0.0,0.9413064438373775,,,,PLEARN-v1.2.13,3.1.3,2021-06-15,v1.2.13,passed_qc,

hCoV-19/USA/CA-CZB-15265/2020|EPI_ISL_738705|2020-11-24,B.1.1.450,0.0,0.9413064438373775,,,,PLEARN-v1.2.13,3.1.3,2021-06-15,v1.2.13,passed_qc,

Here are the screenshots from the NCBI and GISAID metadata.

image

image

Thank you.

jvhagey commented 2 years ago

Hi @chienchi thanks for bringing this to our attention.

I still get the call B.1.1.391 when I use the pangolin container described for this dataset. However, when I use the updated Pangolin container shown in the GISAID screen shot in your issue I get the call B.1.1.450. Thus, I suspect that your pipeline might be calling the newer version of pangolin and using pangoLEARN to make its calls. Can you double check and confirm?

Also, it is important to note that the pipeline we used (Titan) has UShER set as it's default for lineage calling. We kept this default. I added this information to the methods.md along with all the versions of dependencies in the container we used for a complete picture. I apologize that this wasn't clear before.

Here are the lineage calling results from Titan using either pangoLEARN or UShER: image

Let me know if this explains your results.

chienchi commented 2 years ago

Thanks for confirming with this and have the note in the method.md about the --usher flag used in Titan. I have the same version of Pangolin installed and I ran with default parameters without --usher flag. After turning on --usher flag, I can get the same lineage assignment (B.1.1.391).