JEFworks-Lab / HoneyBADGER

HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
http://jef.works/HoneyBADGER/
GNU General Public License v3.0
95 stars 31 forks source link

How to prepare reference expression? #5

Closed saeedsaberi closed 6 years ago

saeedsaberi commented 6 years ago

For a new data set how do you make ref in data(ref) ?

saeedsaberi commented 6 years ago

Hi again, Could you clarify this for me please?

JEFworks commented 6 years ago

You will need to obtain the gene expression of a comparable normal tissue. For the built in dataset, the test data is glioblastoma cells so the reference is an average of all normal brain samples from GTEx: https://www.gtexportal.org/home/tissueSummaryPage The built in datasets were quantified to counts, normalized to TPMs, and log transformed.

GTEx is generally quite good to use as an expression reference for most tissues, but for finer immune populations (which are all groups into Whole Blood in GTEx), I would recommend using the sorted cell types from Zheng et al: https://support.10xgenomics.com/single-cell-gene-expression/datasets

On Nov 30, 2017, at 6:44 PM, saeedsaberi notifications@github.com<mailto:notifications@github.com> wrote:

Hi again, Could you clarify this for me please?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JEFworks_HoneyBADGER_issues_5-23issuecomment-2D348357573&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=qHWB-sJbYA0kS-OaAh_zVYSDZq6dh8FwwlWdqIArC-8&s=pJ7w5RHCIG9jt3rU5UsFFjxMGVkP0ZK-tEHWVD9iyuc&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIj2SN3y9cWPwWTCvmxJVDFYNcIyZ38Gks5s7z35gaJpZM4QvfzG&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=qHWB-sJbYA0kS-OaAh_zVYSDZq6dh8FwwlWdqIArC-8&s=Pfz2jIVX-guP7x9EzFl6jjDnpJN9nKfSV3s-PiRmABU&e=.

saeedsaberi commented 6 years ago

Thanks for the comment, It was indeed very useful.

I'm still not sure for the case of lymphomas how can I make the reference. the samples are very variable regarding their composition of immune cells. Have you ever bench marked your method agains using different ref expressions?

JEFworks commented 6 years ago

Yes, we are currently dealing with the same types of challenges; I have some leukemia datasets that comprise many immune cell types. My solution has been to identify cell types first using dimensionality reduction + clustering approaches such as pagoda2 (https://github.com/hms-dbmi/pagoda2), and then perform HoneyBADGER analysis for each cell type using the appropriate sorted immune cell reference from 10X. You will definitely need to check that any subclonal alteration you identify is not simply contamination from another cell type.

saeedsaberi commented 6 years ago

Thanks, I'm calling cell types using Seurat and the call CNVs using Honeybadger. I was not able to find any CNVs, I'm not surprised with this though since the cancer I am working on is silent in terms of CNVs or the CNVs are very focal. Thanks again for taking the time to respond to the issues here. Good luck with the paper!

dpcook commented 5 years ago

Hi Jean--sorry to dig this up. Just wondering if you had any insight into how close the reference expression should be? Eg. For carcinomas that arise from epithelial cells that represent a small percentage of the bulk tissue they reside in, is the bulk tissue appropriate? Working on ovarian cancer and healthy epithelium makes up a small amount of the organ. I could try bulk RNA-Seq from cultured epithelial cells. Any thoughts about this?

JEFworks commented 5 years ago

Hi David,

If you take a look at Supplemental Figure S6 (https://genome.cshlp.org/content/suppl/2018/06/26/gr.228080.117.DC1/Supplemental_Material.pdf), there are a number of different normal cell types in panel C the clearly all show a normal karyotype. So in this particular case, the reference expression for normalization didn’t seem to matter too much. But we have also seen other cases such as in Supplemental Figure 4C, where the choice of expression reference did impact results, though this data is much older and many of the expression differences may also be due to differences in sequencing platform, sample preparation, data quality, and other technical artifacts that would lead to systematic deviations in expression in addition to tissue-specific effects.

I would definitely recommend trying a publicly available expression reference such as through GTEX (they have 133 ovarian samples for example: https://www.gtexportal.org/home/tissueSummaryPage) before doing a bulk RNA-seq from cultured epithelial cells.

Best, Jean

On Jul 11, 2018, at 2:07 PM, David Cook notifications@github.com<mailto:notifications@github.com> wrote:

Hi Jean--sorry to dig this up. Just wondering if you had any insight into how close the reference expression should be? Eg. For carcinomas that arise from epithelial cells that represent a small percentage of the bulk tissue they reside in, is the bulk tissue appropriate? Working on ovarian cancer and healthy epithelium makes up a small amount of the organ. I could try bulk RNA-Seq from cultured epithelial cells. Any thoughts about this?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JEFworks_HoneyBADGER_issues_5-23issuecomment-2D404278077&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=zfCfifjr1W53p2hq-u6UyBBbEu966H-tS0ThMueTWx8&s=452yoRkCqauhcdB2ntg3o-VKhbrnbwBkljkJj3fiKMs&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIj2SFY1UaAcTQ89LF8RqRgInIEYsTCRks5uFkzvgaJpZM4QvfzG&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=zfCfifjr1W53p2hq-u6UyBBbEu966H-tS0ThMueTWx8&s=MKstLVUvthFOco2GnPblCqDUSfAyLegDU6tQGQnBcy4&e=.