cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
173 stars 28 forks source link

Output of GetAllCallableSites.py for mutational burdens #23

Closed YiqunCao closed 1 year ago

YiqunCao commented 1 year ago

Hi,

I would to calculate the mutational burdens for each cell type, but not sure which number is the callable sites from the output of GetAllCallableSites.py. I attached a few lines from the output file sample_coverage_cell_count.report.tsv here (~2,700 cells): image What does each column mean? Also, could you please described in slightly detail how to calculate the mutational burdens using the output numbers? Thank you!

Francesc-Muyas commented 1 year ago

Dear user, Thanks for using SComatic and thanks for bringing up this topic.

Regarding the "callable sites" question, it is important to take into account that there are two types of values when we speak about coverage (Cov column):

  1. one based on the number of cells with at least one read at a given position (NC column),
  2. another based on the depth or number of reads at each site (DP column).

To clarify this concept and the file format, it is much easier to understand it with one example (using the values shown in your screenshot):

In the B_memory showed in your attached figure, when Cov == 5:

You have this site counting value for each coverage (up to 150 by default). By looking at these values, you can get the number of callable sites based on the minimum coverage (Cov) that you want. For instance, if you want to get the callable sites with at least 10 cells, you should sum the column NC for all rows in the cell type with Cov >= 10.

In our manuscript, we computed the mutation load at cell-type resolution by using a minimum Cov >= 5 and the next formula:

(# somatic mutations in the cell type Z) / (# callable sites in the cell type Z)

I hope it helps, Fran

YiqunCao commented 1 year ago

Hi Fran,

Thank you for the detailed and very helpful explanation! Regarding the "# somatic mutations in the cell type Z", may I confirm that I can just count the number of rows in the file sample.calling.step2.pass.tsv for each cell type without any more filtering?

thanks, Elaine

Francesc-Muyas commented 1 year ago

Yes, as far as you only take the PASS mutations

alextidd commented 1 year ago

Hi Fran, Just wanted to clarify what this means:

count the number of rows in the file sample.calling.step2.pass.tsv for each cell type without any more filtering

Does it mean to count the number of non-NA rows per celltype column in the sample.calling.step2.pass.tsv file? Thanks! Alex