JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
98 stars 12 forks source link

InferCNV from STdeconvolve data #53

Open rstagnit opened 3 months ago

rstagnit commented 3 months ago

Hi @bmill3r ,

Thank you for developing this tool. Deconvolution and annotation have worked well for me thus far. I am now looking to attempt to do CNV analysis on the deconvolved cell types (I have already done them on the non-deconvolved samples) as a form of comparison and achieving more granularity. I am struggling with creating the initial infercnv Object. I'm not sure your familiarity with the package, so I'll briefly detail what is needed.

In order to create an infercnv object, I need a raw_counts_matrix which is the matrix of genes (rows) vs. cells (columns) containing the raw counts. If I am correct, I can use the corpus for my raw counts matrix so that isn't an issue.

You then need an annotations_file which for me is typically a matrix of the barcodes (rows) vs Seurat clusters (columns). The goal would be to have the annotations file be the barcodes corresponding to each cell type. I know the results$theta are proportions of each cell type for each spot, so I am having difficulty trying to circumvent that issue either by making each barcode correspond to the cell type of the highest proportion, or another idea if you have one.

I greatly appreciate any assistance you can give.

Thanks, Rob

bmill3r commented 2 months ago

Hi @rstagnit,

Thanks for your question and your interest in using STdeconvolve. I am not familiar with infercnv but if I understand correctly, you are interested in performing CNV analysis on the multi-cellular spots. If so, then in terms of the inputs you are asking for, using the raw counts from the corpus seems reasonable. For cell type annotations, for infercnv, it only allows for one cell type label per capture location, correct? If so, then using the deconvolved cell type at the highest proportion per spot is probably the simplest strategy. What is the distribution of deconvolved cell type proportions for your spots? Do you have a sense of the multi-cellular resolution of your data? For example, are your spots large enough to contain only a few cells, or are they big enough to contain many cells? Thinking about these characteristics of your data might help determine if using the highest proportion is reasonable.

Sorry I can't be more helpful, Brendan