colomemaria / epiAneufinder

R package to detect breakpoints and assign somies to scATAC-seq data
GNU General Public License v3.0
24 stars 4 forks source link

Parameters for multiome data #24

Open harrsha4 opened 3 weeks ago

harrsha4 commented 3 weeks ago

Dear epiAneufinder team, Thank you for such a useful tool for the community. I am currently working with a 10x multiome sample that has a median 3500 high-quality ATAC fragments per cell (reported by CellRanger). I am currently running epiAneufinder with the following code block.

epiAneufinder(input= *** outdir="epiAneufinder_results" blacklist="hg38-blacklist.v2.bed" windowSize=1e6, genome="BSgenome.Hsapiens.UCSC.hg38", exclude=c('chrX','chrY','chrM'), reuse.existing=TRUE, title_karyo="Karyogram of sample data", ncores= , minFrags= 3000, minsizeCNV=5, k=3, plotKaryo=TRUE)

The resultant karyogram is pasted below: image

What's neat about the results is that its able to capture changes on chromosome 1 and 12 that we have confirmed with bulk methylation. What's concerning is the sparsity of 22 loss which is something that is defining about the tumor we are working with. Looking at some of the individual cell plots:

image

I was wondering if given the sparsity of our data (windowSize = 1e5 produced no CNVs) along with the small size of chromosome 22, we might be missing actual losses. If this is true, are there parameters we might be able to use to adjust to the sparse data. For example, if we were only concerned about arm level changes (p or q) would it be reasonable to set k = 1 for only 2 segments?

thek71 commented 3 weeks ago

Hi,

first of all it's great that you find the tool useful and the results correspond with additional evidence. Regarding the issue at hand, in general as we have also pointed in the publication, epiAneufinder suffers a bit in identifying losses, as all other tools I have seen so far, because of the data sparsity. Given that you have multiome data, where the sequencing is more shallow the plots that you show speak for themselves. That being said, a couple of things that I would try are (1) as you mentioned to use k=1. That will not ensure that you will get chromosome arms, but that you will have only two segments. (2) use an even larger bin size. I would try to see what happens with 5Mb resolution. When we where testing the algorithm we tested different bin sizes. For our SNU601 dataset, which has quite good coverage and depth, we found that 100Mb bins gave higher correlations with the groundtruth, but 1Mb did not perform much worse. We didn't go over the 1Mb, but that doesn't mean that for your dataset it won't work. In that case though I would probably try a different minsizeCNV as well, probably 3 or lower. Maybe you can try a couple of different combinations. Since you also know the locations of the chr1 and chr12 you can test the different parameters and see whether the known CNVs change as a guide. From the karyogram that you posted it seems that many cells have almost lost the whole chr22. Is that your expectation from the tumor cells? And one last question, since you have multiome, these cells that you are showing here are all tumor cells or there might be non-tumor cells as well? I am asking just to get a better idea of the data.

I hope I helped, please tell how it works out.

Best regards, Katia

harrsha4 commented 3 weeks ago

Hi Katia, Thanks for the prompt reply. To address one of your points, yes, we have tried increasing the window size to 5e6 with the karyogram pasted below.

Karyogram

We had the same thought that maybe a larger window size would compensate for low coverage, but the results did not seem too different from the previous run (1e6). We are planning to try different parameters (i.e. k = 1,2 and minsizeCNV = 0:4), but our computing center is currently shut down, and we don't have a local device that could complete the analysis.

As for your point about the chr22 loss showing up on the karyogram, we did note that, and we were pleased by that at first. There are immune cells in the data, but when we determined CNVs by grouping bins, most of the CNVs (chr22, chr1, chr12) were located in our tumor cluster. A previous issue with other programs was that CNVs were being called in the immune cell population.

The issue with the results was that tumor cells with chr1 loss or chr12 gain did not consistently have chr22 loss. A running theory with this tumor type is that chr22 loss is the initial genomic insult followed by chr1 loss and other alterations. Having tumor cells with chr1 loss but without chr22 loss was very surprising. Given that chr22 loss was detected in some cells but not consistently in cells with chr1 loss, we felt that there might be a thresholding parameter we could change that would maybe catch chr22q loss in these chr1 loss exclusive cells.

Please let me know if that clarifies some of your questions, and thank you for your help.

Best,

Harrsha

thek71 commented 3 weeks ago

Hi Harrsha,

thank you for your reply. My questions are basically covered. I guess that your expectation is that all cancer cells have the chr22 loss and either the chr1 loss or chr12 gain. What I am thinking is that the data are probably too sparse for the CNV to be identified in all tumor cells. I hope that the different parameter settings that you are trying will help, but I would not expect to find all the tumor cells having the chr22 loss.

Best regards, Katia