SUwonglab / scABC

19 stars 5 forks source link

Start from matrix #4

Closed r3fang closed 5 years ago

r3fang commented 5 years ago

In the example, it all starts from bam nad peak file, is there any easy way to use scATAC directly from count matrix if we already pre-processed the data?

timydaley commented 5 years ago

Hi Ronxin, thank you for your interest in scABC. Yes, the only issue is how to weight the cells in clustering. We use the background to weight the cells, which requires the original bam files. Without the bam files, then you have to determine another way to weight the cells. One would be to use median of the count matrix (that we call the Foreground matrix in the vignette). This is not something we have fully tested, e.g. how much difference is there between background and foreground and how much information is lost by not using the background to weight cells.

Actually, now that I look at the vignette I'm gonna have to shift some of the code around to accomplish this. I'll work on this a bit today while I have time.

timydaley commented 5 years ago

I've added an option in the landmarks to input user defined weights. I suggest you use the mean of the count matrix. I've tested it on the 6 cell line in-silico mixture, and the results are very good. The vignette is available at https://github.com/timydaley/scABC/blob/master/vignettes/ClusteringWithCountsMatrix.html. When computing the gap statistic you can use the cell level means as BackGroundMedian. Otherwise, everything proceeds as in the other vignettes.

I hope this helps. Feel free to ask us any more questions that you may have, and please let us know if this is successful.

timydaley commented 5 years ago

@r3fang We noticed an issue in computing cluster specific p-values when starting from a counts matrix (Thank you @MahdiZ11). We believe that we have solved the issue. We have updated the vignette https://github.com/SUwonglab/scABC/blob/master/vignettes/ClusteringWithCountsMatrix.Rmd to reflect this. Let us know how this works for you. We're curious if you have success. Thank you.

r3fang commented 5 years ago

sorry for the late response! let me try it out and get you back today! thank you for your help. Appreciate it!

r3fang commented 5 years ago

last question, this is to find cluster, which matrix should i use to perform PCA or tsne against just for visualization

timydaley commented 5 years ago

The foreground matrix. The background is +- 500kb from each peak, and we use it as a measure for local read depth for normalization.

Timothy Daley

Stanford University, Departments of Statistics and Bioengineering.


From: Rongxin Fang notifications@github.com Sent: Friday, August 24, 2018 10:54:12 AM To: SUwonglab/scABC Cc: Timothy Patrick Daley; Comment Subject: Re: [SUwonglab/scABC] Start from matrix (#4)

last question, this is to find cluster, which matrix should i use to perform PCA or tsne against just for visualization

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/SUwonglab/scABC/issues/4#issuecomment-415834358, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHBhckJ-jMEGqLxwZOsI4hxLr4_Eq9rpks5uUD3EgaJpZM4VizlQ.

timydaley commented 5 years ago

Also, I hope this isn't the last question. We are happy to help and answer any questions you may have. I will close this issue, but if you have any more questions, comments, or even critiques, you can make a new issue or email us directly. Thank you Ronxin.

MahdiZ11 commented 5 years ago

To add to Tim's answer, please also take a look at https://github.com/SUwonglab/scABC/blob/master/vignettes/BatchEffect_NumberOfCluster.Rmd. At the end of this vignette, we explain how to plot the t-SNE of clustering results using differential peaks. You can apply this procedure to the foreground matrix as well. Thank you Ronxin for using scABC.

r3fang commented 5 years ago

Sorry for coming back again. I have tried the ClusteringWithCountMatrix. However, the performance was not good on my own data. My understanding is InSilicoSCABCForeGroundMatrix is the peak-by-cell count matrix, am I right? I wonder if it is possible to share the count matrix you have used in the script so I can replicate your result on your data, just make sure I did not make any mistakes

timydaley commented 5 years ago

Hi Rongxin, can you give us your email so that we can send you the files directly?

r3fang commented 5 years ago

It's r4fang@gmail.com. Thank you for your help!

timydaley commented 5 years ago

We should note that we are more uncertain about clustering without background. The background contains a lot of information in the expected counts and allows us to much better quantify the uncertainty of the observed counts, therefore allowing for better clustering.

MahdiZ11 commented 5 years ago

Hi Rongxin,

Just wanted to make sure we answered your questions before closing this issue?

Thanks, Mahdi

r3fang commented 5 years ago

Hi,

Please feel free to close it. Thank you very much!

Sent from my iPhone

On Apr 2, 2019, at 6:53 AM, Mahdi Zamanighomi notifications@github.com wrote:

Hi Rongxin,

Just wanted to make sure we answered your questions before closing this issue?

Thanks, Mahdi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

MahdiZ11 commented 5 years ago

No problem. Happy to answer any question/issue in the future.

mudappathir commented 4 years ago

Hi,

I am trying to replicate the vignette for ClusteringWithCountsMatrix.Rmd. I could not figure out the InSilicoSCABCForeGroundMatrix used. I saw this closed issue requesting the same. I kindly request you to send the matrix used. My email is rekha.m.mec@gmail.com. Thank you for your help.

MahdiZ11 commented 4 years ago

Hi rekha,

I'll email the matrix to you soon.

Thanks, Mahdi