cbg-ethz / BnpC

Bayesian non-parametric clustering (BnpC) of binary data with missing values and uneven error rates
MIT License
18 stars 4 forks source link

Understanding the output of BnpC #15

Closed rituparna-13 closed 2 years ago

rituparna-13 commented 2 years ago

Hi, I am using BnpC to get the clusters of a single cell dataset. After executing it I want to know the cluster nos for each cell which I believe is provided in the "assignment.txt" file and the mutations of each cluster. Can you please help me understand which file will get me the mutations of each cluster? Also, please confirm if "assignment.txt" is the file which indicates cluster no for the cells. I used the following command to execute BnpC:

python ../BnpC/run_BnpC.py filename.tsv -pp 0.75 0.75 -o ./bnpc_results/

Thank you, Ritu

NBMueller commented 2 years ago

Dear Ritu,

Yes, the assignment.txt file contains the MAP/ML assignments for each MCMC chain you ran. The tab-separated file contains the three columns 'chain', 'estimator', 'Assignment', and each row corresponds to one MCMC chain. The values in the assignment column contain the cluster assignments per cell, i.e. the 1st value is the cluster number of cell 1, the 2nd value is the cluster number of cell 2, ...

For the mutations, there should be 2 files: genotypes_<estimator>_<chain>.tsv and genotypes_cont_<estimator>_<chain>.tsv (with <estimator> and <run> being the chosen estimator and the MCMC run). The first one contains binary mutations (i.e. prob > 0.5 = mutated, <= 0.5 = wildtype), the second one the probabilities of each mutation being present in a cell. The rows correspond to the mutations (index = 1 column), the columns to the cells (header = 1 row), but instead of the cell names you have the cluster number in the header row. To get the genotypes of the inferred clusters (instead of the genotype of all cells), you can use only unique columns.

Hope that helps and best! Nico

rituparna-13 commented 2 years ago

Hi Nico,

Thank you for explaining it to me. I have a clear understanding now.

Thanks, Ritu