fasterius / VarClust

A Python package for clustering of single nucleotide variants from high-through seqencing data.
Other
5 stars 3 forks source link

[Heatmap and tSNE Clustering] Metadata file format #3

Closed saisomesh2594 closed 4 years ago

saisomesh2594 commented 4 years ago

Hi again,

So, following your previous advice, I have successfully managed to generate profiles for my VCF files and the distance matrix as well.

However, now coming to plotting the heatmap and tSNE clustering, I see that a metadata file is required which specifies the column id to merge the distance matrix and metadata file on as well as for coloring and shapes (in case of tSNE plotting). I have tried to guess what the metadata file might look like by browsing through the code, but, I have been unsuccessful.

Could you kindly share a snippet of how the metadata file must look like ? And any other info I should be aware of before plotting the heatmaps and the clustering ?

Thanks, Somesh

fasterius commented 4 years ago

While a metadata-file is strictly not necessary to perform hierarchical clustering, it is required for tSNE and for clustering based on groups. All you need is one column that corresponds to the sample IDs you've used when creating the distance matrix (e.g. SRR, which is the default), and then an arbitrary number of other columns containing whatever info is relevant for you. You might include information related patient ID, as was done in the publication of VarClust, for example.

The simplest metadata file is thus only two columns: one ID column corresponding to the IDs used for creating the distance matrix, and one column containing some kind of grouping information. I have now updated the documentation to better explain this.

saisomesh2594 commented 4 years ago

Thanks for the clarification. Worked like a charm!