I think this is occurring at multiple locations through anonymised.aln.fasta.treefile but here is one example.
In the lineage B.2.1 there are 5 representative sequences according to the CSV:
EPI_ISL_419791, EPI_ISL_419792, EPI_ISL_419793, EPI_ISL_419794, EPI_ISL_419797
Excluding 5' and 3' UTRs, these 5 sequences share the following 4 SNPs relative to Wuhan-Hu-1 (EPI_ISL_402125): G26144T, G11083T, C2558T, C14805T
There are then 4 additional non-shared SNPs in these sequences:
A21137G - EPI_ISL_419792
G23984A - EPI_ISL_419794
A2480G - EPI_ISL_419797
C19763T - EPI_ISL_419797
So, we would expect to see two sequences (EPI_ISL_419791, EPI_ISL_419793) on zero length branches from the root of this lineage, 2 sequences each on a branch representing a single SNP (EPI_ISL_419792, EPI_ISL_419794) and finally one sequence on a branch representing 2 SNPs (EPI_ISL_419797).
But in anonymised.aln.fasta.treefile we actually see this:
i.e. 3 sequences on zero length branches, and only one on a single SNP branch.
I think this is occurring at multiple locations through anonymised.aln.fasta.treefile but here is one example.
In the lineage B.2.1 there are 5 representative sequences according to the CSV: EPI_ISL_419791, EPI_ISL_419792, EPI_ISL_419793, EPI_ISL_419794, EPI_ISL_419797
Excluding 5' and 3' UTRs, these 5 sequences share the following 4 SNPs relative to Wuhan-Hu-1 (EPI_ISL_402125): G26144T, G11083T, C2558T, C14805T
There are then 4 additional non-shared SNPs in these sequences: A21137G - EPI_ISL_419792 G23984A - EPI_ISL_419794 A2480G - EPI_ISL_419797 C19763T - EPI_ISL_419797 So, we would expect to see two sequences (EPI_ISL_419791, EPI_ISL_419793) on zero length branches from the root of this lineage, 2 sequences each on a branch representing a single SNP (EPI_ISL_419792, EPI_ISL_419794) and finally one sequence on a branch representing 2 SNPs (EPI_ISL_419797).
But in anonymised.aln.fasta.treefile we actually see this:
i.e. 3 sequences on zero length branches, and only one on a single SNP branch.