Closed kerimsecener closed 4 years ago
You can indeed cluster single cells, which is the general idea of the software. A sample in this case (and in the paper) is a single cell, but it doesn't have to be, it can also be used for bulk analyses - hence using "sample" rather than only "single cell". The important thing is that every VCF file you want to analyse only contains one VCF-level sample, i.e. one sample column coming after the FORMAT
column. This is in contrast to multi-sample VCFs, which contain multiple sample columns.
To demonstrate, look at the VCF format documentation example, which is a multi-sample VCF file. It contains several sample columns, named NA00001
, NA00002
and NA00003
. VarClust would only create an SNV profile for the first sample and ignore the rest (given that the VCF file was named NA00001.vcf
, which is most likely not the case for multi-sample VCFs).
So, as long as you have a single VCF file for each of your single cells and name them according to the sample name column, you're good to go!
I have now included more documentation explaining this, which will hopefully make it clearer. Do ask again if it is still unclear!
Hey,
I have a directory containing 10 VCF files where each VCF file corresponds to one sample (a collection of cells - 10 samples in total). According to the documentation on GitHub, you mention about VCF files for each sample. But, according to your paper, as far as I understand, you implement this method on VCFs corresponding to individual cells rather than samples ? And the tSNE clustering shown in the paper is indeed a clustering of cells based on their individual SNP profiles ? Is this correct ?
If so, how can I generate VCF files for individual cells in my samples ?
Thanks!