chongjin / CARseq

6 stars 1 forks source link

NAs in results #3

Open RandallJEllis opened 2 years ago

RandallJEllis commented 2 years ago

For some reason, a very large portion of the values in the p, padj, and lfc arrays are NAs and I'm not sure why. I am using the raw counts for the bulk RNA-seq as input to CAR-seq.

Any help is appreciated.

chongjin commented 2 years ago

Thank you for using CARseq. This sounds like a convergence issue. One possible reason is that the observations (the raw counts) are low for the genes reporting NA p-values. Another way to improve the convergence of CARseq is using less cell types by collapsing similar cell types together. Hope this helps.

RandallJEllis commented 2 years ago

Thank you. The genes showing NAs do not seem to have low counts. I tried removing cell types with small cell fractions (below 1-2%) and this seems to reduce the number of genes with NAs. Do you have any recommended methods for collapsing similar cell types together so I don't have to remove whole cell types?

Also, I am getting a new issue with the run_CARseq function when I remove a certain number of cell types: Error: $ operator is invalid for atomic vectors. Any help is appreciated.

chongjin commented 2 years ago

I would recommend collapsing similar cell types either based on biological knowledge or how single-cell gene expression data are clustered.

For the new issue, could you provide more details about how removing a certain number of cell types will bring the error message? Thank you.

RandallJEllis commented 2 years ago

Thank you. So by collapse, you mean add their fractions, yes?

Related to the new issue - I have 27 cell types in my data. When I remove the bottom 10 (in terms of max fraction value across all samples), CARseq runs fine. When I remove the bottom 15, I get that error.

chongjin commented 2 years ago

Thank you. Assuming that the cell fractions come from a deconvolution method such as CIBERSORT, you could add up the fraction of closely related cell types. For example, when analyzing brain tissue, the fractions of different types of interneurons can be added together.

For the new issue, I would not envision removing bottom 15 cell types would cause problems. I will be able to look further into the problem if you have a minimal reproducible example, or if you can print out the traceback() result in R.