Open peterwc-cdc opened 8 months ago
@corneliusroemer Is this an issue that you have noticed?
Hi Peter, thanks for opening the issue (always welcome!) and sorry for my delay in replying (I am still at a workshop).
I currently only include lineages that have at least 3 genomes available in genbank, which often means that newer lineages won't be included. I'll see whether there's an additional bug.
It would be great if you could share the full list so I can check if there's anything unexpected showing up!
Current Behavior
There appears to be missing lineages from the https://github.com/corneliusroemer/pango-sequences/blob/main/data/pango-consensus-sequences_genome-nuc.fasta.zst file that are present in the JSON.
A total of 1222 appear to be missing.
Expected behavior
Is there supposed to be one representative for each lineage?
How to reproduce
Steps to reproduce the current behavior:
Compare the JSON summary file to the genome.zst
Possible solution
Are these supposed to be missing? If so we will accept, but it would be nice if they could be added.
Your environment: if browsing Nextstrain online
Downloading and using data file from Github
Let me know if you would like a complete list.