Closed Jeltje closed 2 years ago
The metadata file is matched to the aligned fasta. Are you comparing with the file of all sequences?
I count 1439105 sequences in the aligned fasta and 1439105 non-header rows in the metadata. There are 1601285 sequences in the full fasta - the difference are the sequences which fail our QC (but are made publicly available for completeness/openness). I think this explains your missing numbers.
Judging by the :+1: here I think this has been solved!
In a download we did yesterday, I found 160,916 sample IDs in the fasta file that are not in the metadata file. I have seen discrepancies like this for the past week so it doesn't look like a one time glitch.
This happened in October as well, don't know if it's a similar issue.