cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
145 stars 20 forks source link

How SComatic distinguishes between somatic and germline variants? #24

Closed vladimirkovacevic closed 10 months ago

vladimirkovacevic commented 11 months ago

I managed to run SComatic on mouse colorectal cancer of Stereo-seq data. On the 4th step data for all sites is obtained and I assume the variants could be among locations marked with Multiple_cell_type in the FILTER column. But there are too many of them and ALT alleles on all cell types are the same. Also, all variants are marked with PASS in Cell_type_Filter column. Does SComatic provide the list variants marked strictly as somatic? One more question, is creation of PoN necessary to mark variant as somatic?

Francesc-Muyas commented 11 months ago

Dear user, Thanks for using SComatic.

To get the high-quality somatic variants, you ONLY should take the variants marked as PASS in the FILTER column. The cell type harbouring these variants should be specified in the column _Celltypes.

Ignore the _Cell_typeFilter column, as it only shows the PASS or not PASS labels for the Beta-binomial test at single cell type resolution (first step of the variant calling), and not the cell type comparison or other hard filters.

The PoN is not 100% mandatory to run SComatic, but is recommended to remove artefacts or germline contamination. If you don't have it available for your data, you could use as PoN file an unzipped vcf with the germline variants found in the mouse population (for example, variants found in your mouse strain). Alternatively, you can find a PoN for the mm10 genome (only 10X scRNA-seq data). Unfortunately, we do not have a PoN for Stereo-seq data.

Thanks, Fran

vladimirkovacevic commented 11 months ago

Yes, thank you @Francesc-Muyas! This is what I was looking for. Now the PASS variants by VAF value look like somatic variants, altho some of them have strange VAF such as 0.5 or 0.67. I saw the hardcoded threshold of 0.15 in 4th step (variant calling). What do you think?

Francesc-Muyas commented 11 months ago

Hi,

Variants with VAFs around ~0.5 do not need to be germline variants, especially if you find them in ONLY one of the cell types with enough expression for a given position (at least expression in two cell types). For example, they can be clonal mutations or be the result of allelic dropout.

Please, consider using the PoN as suggested in the previous comments, or simply remove the variant sites with known germline variants in the mouse strain you are working with.

Cheers, Fran