Open vinitamehlawat opened 2 years ago
@vinitamehlawat
VENAS uses the ePISs(effective parsimony-informative sites) to represent the sequence. For identical sequences, it is shown as a single node on the network, and for sequences with only one bp change, it is shown as two nodes if each type occurs at least twice.
VENAS uses the neighbor-joining method to construct the network, trying to connect the sequence with the smallest differences to form an undirected acyclic graph. The links between nodes represent differences or variations between viral genomes, and may also reveal transmission routes when enough samples have been sequenced.
Louvain is a disjoint community detection method to cluster the VENAS network into topologically linked subdomains, which represent different evolution clades containing many closely-connected genome types. Such segmentation enabled us to subjectively identify the topological clades with “tight” intraclade connectivity and the “sparse” interclade connectivity, which reflect the relationship of different genome types among viral communities formed during natural transmission.
Pangolin uses a decision tree to compute the PANGO lineage. For sequences with incomplete features, especially those leaf nodes on the VENAS network, Pangolin may not be able to assign an accurate lineage because new features have not been trained by the model. So the 3 sequences from B.1 lineage you mentioned may actually be close to the BA.1.1 lineage.
Hi @qianjiaqiang
I have some queries regarding Cluster defination:
You periviously mentioned that clusters are basically generated by
louvain algorithm
BUT in sense of genetics what exactly is cluster:Wether these are the collection of exactly same sequence, like: ATGCATGCATGC ATGCATGCATGC ATGCATGCATGC
OR it is
ATGGCATGGC AAGGCATGGC AAGGCATGGC ATGGCATGGC (Sort of similer sequences with one bp chnage)OR could be
ATTTCCGGT AAAACCCCA (Having more number of variability in the sequences)
Thing is that I am getting high number of clusters but i am not able to interpret the cluster in genetic way
EXAMPLE: There is one cluster which is Dominating by BA.1.1 lineage but 3 sequences from B.1 lineage as well in thi s same cluster (I have used -r=1 and -b=0 to retain all my sequences)
It would be very great if you could explain it
Thank you very much Vinita