Using inferCNV on bulk RNA-seq?

biobenkj commented 5 years ago

Without looking too deeply through the code, is there anything specifically inhibiting the use of bulk RNA-seq with inferCNV (e.g. thinking regarding any model assumptions being implemented)? Thanks for your time and having an awesome R package for CNV calling!

brianjohnhaas commented 5 years ago

Hi,

This is something that we need to test. It won't work for certain things - ie. the i6HMM will fail miserably, but the visualizations might still work and the i3 HMM might possibly be useful. A number of people have asked about it, and it's something we plan to spend some time on and eventually incorporate, but I suspect it will need some additional tweaks to be effective. Ultimately, though, with bulk data, I think most will just do exome sequencing for copy number analysis.

best,

~b

On Thu, Apr 11, 2019 at 9:03 AM Ben Johnson notifications@github.com wrote:

Without looking too deeply through the code, is there anything specifically inhibiting the use of bulk RNA-seq with inferCNV (e.g. thinking regarding any model assumptions being implemented)? Thanks for your time and having an awesome R package for CNV calling!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/infercnv/issues/141, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX3H0PdSgjGklIVRrgM9VEchOdJSOks5vfzKTgaJpZM4cpjmv .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

shamdata commented 5 years ago

I was wondering if you could explain further which steps/why the model would fail on bulk tissue, if a matched reference was provided? And if there are any modifications that could be made to the examples in lieu of exome sequencing data.

Thanks for the help, and for the great package/examples!

DarioS commented 4 years ago

Inferring CNV is much more complicated if using bulk RNA-seq. You have a mixture of different cell types. Some are cancer and others are not cancer. Therefore, the CNV will only be present in a subset of cells. But, you don't know the tumour purity of the cancer sample, so don't know which scaling factor to use to calculate the true integer value. I use PURPLE for joint inference of copy number and purity for DNA sequencing data but I haven't found a solution for RNA sequencing data.

You could possibly decompose your gene abundance matrix into multiple matrices, one for each cell type, using CIBERTSORTx. Its statistical model is trained on poly-A RNA-seq data. I have Ribo-Zero depletion RNA-seq, so I am hesitant to use it on my own data set, because I have no benchmarking data to check if it works accurately in that case. Having some sample with 5% ribosomal RNA left and others with 30% left would break badly CIBERSORTx's assumptions, I think.

shamdata commented 4 years ago

Thats a great idea to deconvolve and then attempting a CNV inference on bulk RNA-seq idea - thanks!

DarioS commented 4 years ago

The new method SuperFreq can do it accurately by using the B-allele frequencies.

brianjohnhaas commented 4 years ago

Thanks, Dario!

On Tue, Jun 2, 2020 at 4:00 AM Dario Strbenac notifications@github.com wrote:

The new method SuperFreq https://www.biorxiv.org/content/10.1101/2020.05.31.126888v1 can do it accurately by using the B-allele frequencies.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/infercnv/issues/141#issuecomment-637364195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX5ROR3JGN2VSLS5WRDRUSWRTANCNFSM4HFGHGXQ .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

mmfalco commented 2 years ago

@brianjohnhaas are there any news on this? Some of the methods proposed by @DarioS seems interesting but require BAM files which are not always accessible. I still consider it would suppose an additional benefit to have the possibility to use inferCNV for bulk RNAseq counts. There are some deconvolution methods that would allow to extract the expression matrix corresponding to the cancer component from bulk samples. This might solve the concerns about tumor purity and even use the stromal or immune components as the normal reference. Not sure if what I said makes sense but I just wanted to check what are your thoughts about it or if you are working on this. If you were to do it with the current version would you do it with the following parameters?:

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=0.1,  
                             out_dir= "results/inferCNV/bulk_deconv/",  
                             cluster_by_groups=F,   # cluster
                             analysis_mode = "cells" ,
                             denoise=T,
                             HMM_report_by = "cell",
                             HMM_type="i3",
                             leiden_resolution = 2,
                             HMM=T,
                             num_threads=20,
                             plot_steps = F,
                             k_nn=1,
                             tumor_subcluster_partition_method ="leiden"
)

broadinstitute / infercnv

Using inferCNV on bulk RNA-seq? #141

--

--