cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
163 stars 25 forks source link

How to use on data with single cell type #19

Closed cutleraging closed 1 year ago

cutleraging commented 1 year ago

Hello,

I have read your paper with great interest (very well written) and would like to apply this to some of my own data. I have 10x scRNA and 10x scATAC data of a single cell type from cell culture. Since it appears that your method relies on having at least 2 cell types (to consider bases with >= 5 reads across >= 2 cell types and to filter out germline mutations found in all cell types), how would you suggest I adjust your tool to be used with my data?

Should I randomly subdivide the cells? Although this would seem to affect the fact that mutations must be supported by at >= 3 reads within >= 2 cells of the same type. As an aside, does this also mean that this method only detects subclonal mutation which have to be found in more than one cell? Or is it possible to detect mutations which may be specific to only a single-cell?

Thanks a lot!

Ronnie

Francesc-Muyas commented 1 year ago

Dear user,

SComatic gets the best performance when comparing different cell types. However, it is possible to run this algorithm with only one cell type. Importantly, it is essential to consider that the tool's performance will significantly drop, especially the capacity to detect systematic errors and germline variants.

However, there are different parameters that you can play with to minimise the number of false positives:

As you no longer compare different cell types, you will have an enrichment of germline mutations in the final call set. For removing them, I strongly suggest that you do not consider for downstream analysis those mutations found in Gnomad or EXaC in a high population frequency (e.g. ignore mutation in Freq > 1 % in these databases). Check as well the different types of PoN suitable for SComatic; it might be interesting to use some of the PoNs generated by GATK using the 1000G samples.

In addition, one could also check the Variant Allele Frequency (VAF) values provided in the final output of SComatic. Germline mutations should have values around 0.5 or greater (> ~ 50%) in diploid regions, while somatic mutations should have lower values. However, this last point might have potential biases, as clonal mutations will look like germline heterozygous variants from the VAF point of view (around 0.5).

In any case, all these filters and suggestions will not remove the complete set of germline mutations from your final mutation call set.

Thanks for your interest, Fran

freddie090 commented 1 year ago

Hi @Francesc-Muyas,

Also a fan of the tool - a clever way of leveraging scRNA data. I have a question related to the one above:

Similar to Ronnie, I also have cells from a single cell-line. However, I have multiple samples that represent different replicates that have been isolated for a prolonged experiment and subject to different treatments.

In this case, would it make sense to replace the cell type annotations (as described in SComatic's documentation) in the pipeline with the sample replicate identities? Would SComatic then work by identifying the mutations that are unique to specific replicates and, by extension, associated with different treatment conditions?

I also have a 'pre-' sample that is the cell line at the beginning of the experiment - would it make sense to simply include this as an additional cell type in the metafile?

As per your comment above, would I have to be mindful about choosing specific parameters during the analysis to minimise the identififcation of false-positives?

Thanks for any help -

Freddie

cutleraging commented 1 year ago

Hi Fran,

Thanks for your detailed reply. I will try this out and let you know how it goes!

Something else came to my mind. In my experiment, I have the same single cell type, but two different conditions. Would it be useful for your algorithm if I were to run these separately? Or would it help if I combine them into a single BAM file but mark them as two different cell types?

Best, Ronnie