Closed dy-lin closed 1 year ago
The tricky thing is that there isn't an obvious way of calling variants with maegatk since there is only 1 strand. In the MAESTER paper, there were a variety of filtering strategies to identify high-confidence variants. An mgatk versus maegatk comparison will show that most of the per cell, per variant allele counts will be similar.
The logic behind the ReadMGATK() --> variant calling in Seurat fails again because there is no strand concordance to call. @petervangalen do you have any advice on the best scripts to refer to call variants from maegatk output?
To call informative variants from maegatk output, I generated a heteroplasmy/allele frequency matrix for all possible variants (af.dm
) and applied thresholds depending on the experimental question, such as "variants with 20-fold coverage, VAF of <10% in the bottom 10% of cells and VAF of >90% in top 10% of cells" (TenX_CellLineMix) or "variants present in at least 10 cells with a VAF of >50%" (CH_Sample) or "variants with differing presence between samples (or cell subsets) from the same patient" (GBM). This would yield on the order of 10 to 30 informative variants of which >80% transitions. Then I would consider cells to be part of a clone based on detecting these informative variants (e.g. heteroplasmy > 1%). As I said, this downstream analysis depends on the biological question and generally required a lot of iteration to find the best thresholds for each experiment. I am still looking for a more standardized approach.
We are also planning to compare scATAC vs. scRNA-derived variants, so it might be interesting to compare notes. Hope that helps!
I'm looking to compare the calls between mgatk and maegatk. I've read the FAQ and know that they should produce very similar results. I'm currently working with scRNA-seq data from 10X Multiome kit, processed with Cellranger-ARC.
How is a comparison made? For mgatk, I can read the results into a Seurat object using
ReadMGATK()
. However this function fails for maegatk output as it has more columns. Is there a way to output a Seurat object RDS like there is a Signac object RDS for mgatk?Are there any suggestions to make a scRNA via mgatk vs a scRNA via maegatk comparison? Ultimately I will compare scATAC via mgatk against scRNA via maegatk.