Confused about the train/test steps

Bennylikescoding commented 1 year ago

Dear Professors,

Thanks for coming up with this great tool. However, I'm confused with how to use it by the tutorial. In PDAC deconvolution, the tutorial only uses the train_vae function, however, in demo1 tutorial for example, it uses additional load_vae_and_generate function from the .pth vae model from train_vae function.

So here comes to my question, if I only focus on the first step to transform bulkRNA to single-cell RNA (i.e., no consideration of further scRNA to spatial RNA):

If I have e.g., two bulkRNAseq from 2-month-old and 7-month-old mice lung cancer tissue, say bulkA and bulkB. I also have one single-cell RNA reference, say scRNAref. When I deconvolute bulkA using scRNAref to a new, bulk2space-generated scRNA data (name it "generated-scRNA from bulkA"), I will get a .pth vae model (name it "A.pth"). Next, when I'd like to deconvolute bulkB, which step should I use? Should I 1) use "load_vae_and_generate" function that use the previous A.pth model, or 2) use "train_vae" function that will generate a new B.pth model?

I believe this is crucial because it directly guides us how to use this tool. In CIBERSORT, we provide only two variables, the bulkRNAseq and the reference immune cell expression profile. The reference would not change most of the time, thus we just feed CIBERSORT with many bulkRNAseq dataset and it will return many generated immune cell expression dataframes. Simple and easy. But in Bulk2space, we got a new .pth model everytime if we follow step 2, and to be honest, I don't know what this .pth model is used for if not following step1 to use it to load and generate new scRNA dataset.

Besides issues above, if we use step 1), there'll also be problems. What if bulkA and bulkB are from different status of tissues as the example above? I see that in the article, you mentioned that "the state of each cell type still fluctuates within a relatively stable high-dimensional space". But if bulkA was from a pre-cancerous tissue, and bulkB was from a cancerous tissue, would bulk2space still work fine? This is important because if we'd like to deconvolute bulkRNAseq from longitudinal dataset, for example, a series of bulkRNAseq data from 10 timepoints along cancer progression that contains normal, pre-cancerous, turning stage and finally cancerous tissue, or a series of bulkRNAseq data from different development stages of liver, what is the correct way of using bulk2space if I want single-cell RNA dataset from bulkRNA? Would bulk2space still work under this scenario?

Also, does bulk2space requires that scRNA ref and bulkRNA are from similar status of tissue? For example, can bulk2space deconvolute bulkRNA derived from cancer lung using the reference scRNA derived from normal lung?

Actually I've tried to use step 1 (i.e., the same model) to deal with my longitudinal dataset but the results seemed very identical concerning the distribution of cell types that bulk2space returned (which should have some difference at least in immune cell types since I'm deconvoluting bulkRNA from normal and cancer tissues using the same scRNA ref). Also, another key issue is, I don't know whether the generated sc_cell_type and sc_data dataframe can be treated as a standard Seurat object that we can use standard analysis pipeline (like filtering nfeature and nCount, scaling, centering, pca, umap, or newly assign cell types according to FindMarkers function, etc. Acturally I've tried on them but the PCA, tSNE or UMAP can't efficiently separate cell types well), and whether different scRNA datasets generated by bulk2space can be supported to integrate into a single Seurat object like other normal single-cell data do?

Thank you so much and it would be of a great help if the experts in your team who developped this nice tool could answer the issues above.

AlexUOM commented 1 year ago

Hi,

I am also facing these concerns as I have different bulk RN-Seq datasets to deconvolute, some of which are from longitudinal work. I would be interested in hearing from the authors about the best practise to handle these scenarios.

Also, another key issue is, I don't know whether the generated sc_cell_type and sc_data dataframe can be treated as a standard Seurat object that we can use standard analysis pipeline

I am also interested in this as I tried to load it in Seurat but I am still trying to separate cell types via UMAP. I suspect the scRNA generated is already given as normalized and scaled judging from what the numbers look like. It would be great if the authors can confirm this or clarify a bit more.

xiang-cy commented 1 year ago

@Bennylikescoding hello, there. I only focus on the first step to transform bulkRNA to single-cell RNA (i.e., no consideration of further scRNA to spatial RNA) too. But I met a problem that input_st_data_path and input_st_meta_path were still required when using _generate_sc_meta, generate_sc_data = model.train_vae_andgenerate(), but I don't have st data in fact. How can I just skip this or if there is another way to solve it?
Looking forward to your reply, your help would be much appreciated.

SpaTrek commented 1 year ago

@Bennylikescoding, Thanks. you proposed a very specific question. Bulk2Space can be divided into two separate algorithms. I have seen some successful attempts using the either step of bulk2space recently. We are considering an update of the algorithm in a new project. @AlexUOM I have received an email from a student from the University of Manchester. I suppose that's you. Should you have any questions, please leave a message here so everybody can see them. And we are processing issues only on GitHub. @xiang-cy Same answer as the first one.

Bennylikescoding commented 11 months ago

@Bennylikescoding hello, there. I only focus on the first step to transform bulkRNA to single-cell RNA (i.e., no consideration of further scRNA to spatial RNA) too. But I met a problem that input_st_data_path and input_st_meta_path were still required when using _generate_sc_meta, generate_sc_data = model.train_vae_andgenerate(), but I don't have st data in fact. How can I just skip this or if there is another way to solve it? Looking forward to your reply, your help would be much appreciated.

@xiang-cy Hello xiang,

I find some spatial data in published papers that close to what I'm working on. I believe you could also find one, e.g., tissue type, disease and age status that're close to what you're working. The input spatial data and meta info seems can't be omitted.

ZJUFanLab / bulk2space

Confused about the train/test steps #14