Closed jdeligt closed 3 years ago
Hi @jdeligt, this is a highly requested feature! We haven't got that in place in our setup at the moment partly due to the nature of the assignment model (the decision tree rules are difficult to tease apart to give exact SNPs), but it's something we're working on for the next set of releases for pangolin as the next model will make it more feasible to provide this for each lineage.
Thank you for your reply, I know it's hard problem so it's exciting to hear this is being worked on. I'll leave this one open so that are people looking for that data can find it and know the status
Thanks for the update Áine. There are a couple of places where the documentation refers to this sort of file, but I can't find the file. Based on your description above, I take it that those references are outdated... or am I missing something?
https://github.com/cov-lineages/pangolin/releases/tag/v2.0 "pangoLEARN contains information about the top SNPs that are most positively and negatively associated with a given lineage. The lineage recall report is also available in this repository."
https://cov-lineages.org/pangolin_docs/pangolearn.html "We have pulled out informative sites and this information is included in the data release on pangoLEARN. The top SNPs that are most positively and negatively associated with a given lineage are detailed in those files. More details on this release and its practicalities can be found here."
Hi,
as a follow-up to the thread here, I am wondering if there is any plan soon to give the possibility to fetch the list of lineage-defining SNPs. Description of genomes (especially when it comes to non-lineage-defining SNPs) and VOC/VUI investigations would surely benefit from this feature.
Thanks for your impressive work!
Hi, the new Scorpio tool is designed for doing just that - https://github.com/cov-lineages/scorpio/
There's a new tool from our group by Rachel and Ben called scorpio that can fetch the defining set of mutations from a set of genomes (either relative to an outgroup or the early haplotype from Wuhan). There's also a small number of constellation files for the VOCs available at https://github.com/cov-lineages/constellations/tree/main/constellations/definitions. This has all just been developed very recently and more documentation will be written up shortly. If you're looking for a resource that can give the mutations for all lineages, outbreak.info is a really great website that has lists of SNPs per lineage.
Hi @rambaut and @aineniamh thanks for your replies and hints! I'll test scorpio as soon as possible, in the meantime outbreak.info will do the job. Are you planning to integrate scorpio in pangolin? :grin:
We are! It's currently integrated on this branch: https://github.com/cov-lineages/pangolin/tree/newscoring
We're planning to merge into the master next week after some more testing and updating of documentation!
Awesome! Thank you!
Are there any plans to have the nucleotide changes available, like asked in the first post? This is still mostly gene based with the AA changes. Also asked in #126.
Hi @cutpatel, that isn't really a pangolin issue- we're not hosting any coordinates on this repo now and the config files for post hoc tests were never intended as as reference, just for internal assignment.
I don't think I can resolve your issue #126 as we're not hosting that information here. Apologies! If it's helpful here's the link to gene coordinates on genbank: https://www.ncbi.nlm.nih.gov/nuccore/1798174254. If you need nucleotide coordinates you should be able to convert them with a relatively simple function.
This issue is now stale and so am closing it:
I was wondering if you have something similar to this: https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv for pangolin?
I'm basically looking for an overview of the nucleotide changes that 'define' a certain lineage.