cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
428 stars 107 forks source link

Samples Unassigned #451

Closed huipingis closed 2 years ago

huipingis commented 2 years ago

Dear Sir/Madam,

Recently we got results on COVID genotyping, and found some samples unassigned, see below:

NEG6_13052022 | Unassigned | 0,090909091 | PUSHER-v1.8v4.0.6 | pass | Ambiguous_content:0.15 | Omicron (Unassigned) | Usher placements: BA.2(10/11) BA.2.11(1/11); scorpio replaced lineage inference BA.2

POS1_13052022 | Unassigned | 0 | PUSHER-v1.8v4.0.6 | pass | Ambiguous_content:0.12 | Probable Omicron (Unassigned) | Usher placements: BA.2.3.3(2/2); scorpio replaced lineage inference BA.2.3.3

pOS2_13052022 | Unassigned | 0,333333333 | PUSHER-v1.8v4.0.6 | pass | Ambiguous_content:0.1 | Probable Omicron (Unassigned) | Usher placements: BA.2(1/3) BA.2.3.3(2/3); scorpio replaced lineage inference BA.2.3.3

I am wondering why these samples were unassigned for lineages? Thanks.

Best regards, Chen Huiping University Hospital of Iceland

Hideaki615 commented 2 years ago

I've got the same issue when I updated my local pangolin_3.1.20 to 4.0.6.

Then it could be avoided by downgrading constellations-0.1.10 to 0.1.9, by executing a command like below.

$ conda activate pangolin $ pip install git+https://github.com/cov-lineages/constellations.git@v0.1.9

Adding, I don't know how much it would be related, but I found the output of scorpio call is decreasing in the latest version, like below.

sample-01 -- constellations-0.1.9 ---- scorpio call: Alt alleles 37; Ref alleles 1; Amb alleles 20; Oth alleles 1 -- constellations-0.1.10 ---- scorpio call: Alt alleles 21; Ref alleles 0; Amb alleles 12; Oth alleles 0

So the latest one may have underestimated alleles and therefore made the sample "Unassigned", I guess...

huipingis commented 2 years ago

Thanks. The 3 samples I showed: 1 negative control and 2 positive controls. I speculate that they were unassigned due to contamination during PCR and sequencing. It´s clear the negative control turned positive because of contamination. The other 2 positive controls (lineage is B) were unassigned probably due to contamination too, because I saw some reads from other strains by alignment of reads. I wonder if the contamination is one of reasons for lineage un-assignment? Thanks.

Best regards, Chen

huipingis commented 2 years ago

Thanks. The 3 samples I showed: 1 negative control and 2 positive controls. I speculate that they were unassigned due to contamination during PCR and sequencing. It´s clear the negative control turned positive because of contamination. The other 2 positive controls (lineage is B) were unassigned probably due to contamination too, because I saw some reads from other strains by alignment of reads. I wonder if the contamination is one of reasons for lineage un-assignment? Thanks.

Best regards, Chen


Frá: Hideaki615 @.> Sent: miðvikudagur, 18. maí 2022 09:47 Til: cov-lineages/pangolin @.> Afrit: Huiping Chen @.>; Author @.> Efni: Re: [cov-lineages/pangolin] Samples Unassigned (Issue #451)

Þú færð ekki oft tölvupóst frá @.*** Kynntu þér hvers vegna þetta gæti verið mikilvægthttps://aka.ms/LearnAboutSenderIdentification

I've got the same issue when I updated my local pangolin_3.1.20 to 4.0.6.

Then it could be avoided by downgrading constellations-0.1.10 to 0.1.9, by executing a command like below.

$ conda activate pangolin $ pip install @.***https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcov-lineages%2Fconstellations.git%40v0.1.9&data=05%7C01%7Chuiping%40landspitali.is%7C3f09c5eb0bdd4829402008da38b360fa%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884640332709166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oBUOIPjYovcxPXuuGx93nEdVTUyNHn1wKbN3xsuJpH4%3D&reserved=0

Adding, I don't know how much it would be related, but I found the output of scorpio call is decreasing in the latest version, like below.

sample-01 -- constellations-0.1.10 ---- scorpio call: Alt alleles 37; Ref alleles 1; Amb alleles 20; Oth alleles 1 -- constellations-0.1.9 ---- scorpio call: Alt alleles 21; Ref alleles 0; Amb alleles 12; Oth alleles 0

So the latest one may have underestimated alleles and therefore made the sample "Unassigned", I guess...

— Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcov-lineages%2Fpangolin%2Fissues%2F451%23issuecomment-1129801511&data=05%7C01%7Chuiping%40landspitali.is%7C3f09c5eb0bdd4829402008da38b360fa%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884640332709166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vOGQpEHM%2FRTAXpbOOBuNPIyMV4BQNCxtCVgv0tRzkuY%3D&reserved=0, or unsubscribehttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAZD3SUVZT4NQMUAN6UTE3SLVKS4B3ANCNFSM5WEG43IQ&data=05%7C01%7Chuiping%40landspitali.is%7C3f09c5eb0bdd4829402008da38b360fa%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884640332709166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PKKKrXsFaIfno3cmOAz4PDItbsXSkGYWiFvRZv5aUMI%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

aineniamh commented 2 years ago

Unassigned is often a result of conflicting signals or reference calls that scorpio thinks would make a lineage call unreliable. If you suspect your sequences are from contaminated samples, this would exactly be why scorpio would mask the lineage call.

huipingis commented 2 years ago

Thanks, Áine.

I wonder if the contamination can cause a wrong call, e.g., from B to BA.3 by using pangolin?

Regards, Chen

huipingis commented 2 years ago

Thanks, Áine.

I wonder if the contamination can cause a wrong call, e.g., from B to BA.3 by using pangolin?

Regards, Chen


Frá: aineniamh @.> Sent: miðvikudagur, 18. maí 2022 10:12 Til: cov-lineages/pangolin @.> Afrit: Huiping Chen @.>; Author @.> Efni: Re: [cov-lineages/pangolin] Samples Unassigned (Issue #451)

Þú færð ekki oft tölvupóst frá @.*** Kynntu þér hvers vegna þetta gæti verið mikilvægthttps://aka.ms/LearnAboutSenderIdentification

Unassigned is often a result of conflicting signals or reference calls that scorpio thinks would make a lineage call unreliable. If you suspect your sequences are from contaminated samples, this would exactly be why scorpio would mask the lineage call.

— Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcov-lineages%2Fpangolin%2Fissues%2F451%23issuecomment-1129826469&data=05%7C01%7Chuiping%40landspitali.is%7C7ecca9cbb36f44a699f508da38b6eb7c%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884655549170016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=R6ixVsTy0YjigJIJwqknAlFzY%2B%2Bph9nXtM7hk5TI1VI%3D&reserved=0, or unsubscribehttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAZD3SUU2642GZHVXW7SNCELVKS7A3ANCNFSM5WEG43IQ&data=05%7C01%7Chuiping%40landspitali.is%7C7ecca9cbb36f44a699f508da38b6eb7c%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884655549170016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2tppa4HFXieMnDvt6KYF8KQwPZ2urPffPCIUXp20plo%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

aineniamh commented 2 years ago

Hi @huipingis, this can absolutely happen as conflicting signals can confuse the inference. This is why we curate the assignments and check this, reporting Unassigned if the sequence doesn't meet the scorpio checks.

eddieimada commented 2 years ago

Hi @aineniamh, Does this means that if scorpio cannot confidently assign a lineage, the lineage call from UShER should not be trusted and discarded?

huipingis commented 2 years ago

Hi, Eddie,

Some samples were unassigned, probably due to low quality of the sequencing data, which is not for the whole genome, just a part of it. So accurate genotyping is impossible.

Regards, Chen


Frá: Eddie Imada @.> Sent: miðvikudagur, 18. maí 2022 16:39 Til: cov-lineages/pangolin @.> Afrit: Huiping Chen @.>; Mention @.> Efni: Re: [cov-lineages/pangolin] Samples Unassigned (Issue #451)

Hi @aineniamhhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faineniamh&data=05%7C01%7Chuiping%40landspitali.is%7C0ab68e2ad7f14295bcee08da38ed09f9%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884887973985324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3p7bTKO8akS%2BEcK64HoJAfntWgmP3UenoK6PWfuJoS0%3D&reserved=0, Does this means that if scorpio cannot confidently assign a lineage, the lineage call from UShER should not be trusted and discarded?

— Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcov-lineages%2Fpangolin%2Fissues%2F451%23issuecomment-1130242649&data=05%7C01%7Chuiping%40landspitali.is%7C0ab68e2ad7f14295bcee08da38ed09f9%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884887973985324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3Zhx5%2Fm%2BGFTuQ0ql%2B6o7cnjqMcDohZOvDMrgMI1I%2FZ4%3D&reserved=0, or unsubscribehttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAZD3SURW2EZSTXWLK5NVGM3VKUMNTANCNFSM5WEG43IQ&data=05%7C01%7Chuiping%40landspitali.is%7C0ab68e2ad7f14295bcee08da38ed09f9%7Ce1011e5272104017950f458075f9f84e%7C0%7C0%7C637884887973985324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1LcVc4jJhXEWRhHvEThczQb8h7SLs8UOH%2FLXyP7uLdE%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

aineniamh commented 2 years ago

Hi all, in the latest release of pangolin (4.1), we now have introduced an updated process.

We are confident in the calls by usher, so now bypass scorpio in usher mode and do not overwrite the usher assignment (so now default behaviour is to take the usher assignment as is, but still report the scorpio call). In pangoLEARN/ fast mode we still overwrite assignments that do not explicitly agree with scorpio to elimitate false positive calls with the pangoLEARN method.

Hopefully this will resolve these Unassigned samples, but bear in mind that they would have been unassigned because they did not pass the scorpio checks.