bioinfo-biols / SEVtras

sEV-containing droplet identification in scRNA-seq data (SEVtras)
GNU Affero General Public License v3.0
17 stars 5 forks source link

Error reported in SEVtras.ESAI_calculator #8

Open YangXinyan opened 8 months ago

YangXinyan commented 8 months ago

Hello, I currently want to reproduce the results in SEVtras using data from 15 normal tissues. When I was running the SEVtras.ESAI_calculator function, the following error occurred. I tried for a long time but could not solve it, so I am asking for your help.

/home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/raw.py:146: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. PassAnnData(X, dtype=X.dtype, ...)to get the future behavour. uns=self._adata.uns.copy(), /home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. PassAnnData(X, dtype=X.dtype, ...)` to get the future behavour. [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas], /home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/anndata.py:798: UserWarning: AnnData expects .var.index to contain strings, but got values like: []

Inferred to be: empty

value_idx = self._prep_dim_index(value.index, attr)

AttributeError Traceback (most recent call last) /tmp/ipykernel_19366/2460094773.py in 1 import SEVtras ----> 2 SEVtras.ESAI_calculator(adata_ev_path='./sEV_SEVtras.h5ad', adata_cell_path='../06.Seurat.15Tissue/pbmc.combined.v2.h5ad', out_path='./', Xraw=True, OBSsample='sample', OBScelltype='Stage2')

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py in ESAI_calculator(adata_ev_path, adata_cell_path, out_path, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW, plot_cmp, save_plot_prefix, OBSMumap, size) 185 adata_cell = read_adata(adata_cell_path, get_only=False) 186 from .functional import deconvolver, ESAI_celltype, plot_SEVumap, plot_ESAIumap --> 187 celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW) 188 ##ESAI for sample 189 sample_ESAI = (adata_com[adata_com.obs[OBScelltype]==OBSev,].obs[OBSsample].value_counts() / adata_com[adata_com.obs[OBScelltype]!=OBSev,].obs[OBSsample].value_counts()).fillna(0)

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/functional.py in deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW) 112 def deconvolver(adata_ev, adata_cell, OBSsample='batch', OBScelltype='celltype', OBSev='sEV', OBSMpca='X_pca', cellN=10, Xraw = True, normalW=True): 113 --> 114 adata_combined = preprocess_source(adata_ev, adata_cell, OBScelltype=OBScelltype, OBSev=OBSev, Xraw = Xraw) 115 gsea_pval_dat = source_biogenesis(adata_cell, OBScelltype=OBScelltype, Xraw = Xraw, normalW=normalW) 116 near_neighbor_dat = near_neighbor(adata_combined, OBSsample=OBSsample, OBSev=OBSev, OBScelltype=OBScelltype, OBSMpca=OBSMpca, cellN=cellN)

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/functional.py in preprocess_source(adata_ev, adata_cell, OBScelltype, OBSev, Xraw) 79 80 adata_combined.obs[OBScelltype] = pd.Categorical(adata_combined.obs[OBScelltype], \ ---> 81 categories = np.append(adata_cell_raw.obs[OBScelltype].cat.categories.values, OBSev), ordered = False) 82 83 adata_combined.raw = adata_combined

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls) 179 # we're accessing the attribute of the class, i.e., Dataset.geo 180 return self._accessor --> 181 accessor_obj = self._accessor(obj) 182 # Replace the property with the accessor object. Inspired by: 183 # https://www.pydanny.com/cached-property.html

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in init(self, data) 2599 2600 def init(self, data): -> 2601 self._validate(data) 2602 self._parent = data.values 2603 self._index = data.index

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in _validate(data) 2608 def _validate(data): 2609 if not is_categorical_dtype(data.dtype): -> 2610 raise AttributeError("Can only use .cat accessor with a 'category' dtype") 2611 2612 def _delegate_property_get(self, name):

AttributeError: Can only use .cat accessor with a 'category' dtype`

Below is my UMAP diagram

image

image

Looking forward to your reply.

Best wish

RuiqiaoHe commented 8 months ago

This error may reason from that the data type of 'Stage2' in your cell matrix obs is not a category dtype. You can first make it as a 'category' dtype by adata_cell.obs['Stage2'] = pd.Series(adata_cell.obs['Stage2'], dtype="category"), and then input the cell matrix to SEVtras.ESAI_calculator.

Oahux99 commented 8 months ago

I was also trying to reproduce this part of results, but I could not find the raw_feature_bc_matrix? Could any of you be kind to tell me where I can download these data? THANKS!

RuiqiaoHe commented 8 months ago

The raw_feature_bc_matrix folder would be located in the outs of the CellRanger outputs for each scRNA-seq sample. Here is one example: image

YangXinyan commented 8 months ago

I was also trying to reproduce this part of results, but I could not find the raw_feature_bc_matrix? Could any of you be kind to tell me where I can download these data? THANKS!

Hi, I downloaded the raw data from the article [Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs], and then ran CellRanger to get the raw_feature_bc_matrix.

Oahux99 commented 8 months ago

So you did it yourself using the sequencing data here (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE159929)? I thought there would be processed data that could be downloaded directly from the Internet. Thank you for your reply.

Oahux99 commented 8 months ago

I was also trying to reproduce this part of results, but I could not find the raw_feature_bc_matrix? Could any of you be kind to tell me where I can download these data? THANKS!

Hi, I downloaded the raw data from the article [Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs], and then ran CellRanger to get the raw_feature_bc_matrix.

Thanks so much

YangXinyan commented 8 months ago

This error may reason from that the data type of 'Stage2' in your cell matrix obs is not a category dtype. You can first make it as a 'category' dtype by adata_cell.obs['Stage2'] = pd.Series(adata_cell.obs['Stage2'], dtype="category"), and then input the cell matrix to SEVtras.ESAI_calculator.

Thank you very much for your prompt reply. But I feel that there may be some bugs when I converted from Seurat to h5ad files, which caused various problems. I will try again and the results will be listed here. Thanks again

YangXinyan commented 8 months ago

So you did it yourself using the sequencing data here (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE159929)? I thought there would be processed data that could be downloaded directly from the Internet. Thank you for your reply.

This website provides some raw_feature_bc_matrix, I hope it will be helpful to you. https://dna-discovery.stanford.edu/research/datasets/

YangXinyan commented 8 months ago

This error may reason from that the data type of 'Stage2' in your cell matrix obs is not a category dtype. You can first make it as a 'category' dtype by adata_cell.obs['Stage2'] = pd.Series(adata_cell.obs['Stage2'], dtype="category"), and then input the cell matrix to SEVtras.ESAI_calculator.

Thank you very much for your prompt reply. But I feel that there may be some bugs when I converted from Seurat to h5ad files, which caused various problems. I will try again and the results will be listed here. Thanks again

Hello developer, I have solved the error problem and successfully ran SEVtras. The main problem is that when converting from Seurat to h5ad (refer to SeuratDisk ), some unknown errors will occur. When I follow the tutorial in this video to convert the format, it can run normally.

Thanks again for your help!

RuiqiaoHe commented 8 months ago

Thank you for your testing. I think it will be a great help for SEVtras.

FLY-Fiancee commented 8 months ago

This error may reason from that the data type of 'Stage2' in your cell matrix obs is not a dtype. You can first make it as a 'category' dtype by , and then input the cell matrix to SEVtras.ESAI_calculator.category``adata_cell.obs['Stage2'] = pd.Series(adata_cell.obs['Stage2'], dtype="category")

Thank you very much for your prompt reply. But I feel that there may be some bugs when I converted from Seurat to h5ad files, which caused various problems. I will try again and the results will be listed here. Thanks again

Hello developer, I have solved the error problem and successfully ran SEVtras. The main problem is that when converting from Seurat to h5ad (refer to SeuratDisk ), some unknown errors will occur. When I follow the tutorial in this video to convert the format, it can run normally.

Thanks again for your help!

I am experiencing this same problem, thank you for the method you have provided

youngjp0829 commented 7 months ago

Hello!! I applied SEVtras on my 10x scRNA-seq data and finished the first SEVtras.sEV_recognizer step. When working on the ESAI calculating, because I used Seurat for my analysis, I converted the Seurat object according to @YangXinyan 's comments and manually added(or corrected) batch information in my adata_ev and adata_cell which I will use as inputs. But the tutorial shows "The fourth parameter means whether to use the raw object in the adata_cell or not. If adata_cell has been filtered or normalized, please set Xraw=True, and adata_cell.raw will be used (Note: save raw adata_cell as adata_cell.raw before filtering).". Since the h5ad file converted from Seurat object only includes filtered annotated cells, I assume I should set Xraw=True. But then the adata_cell.raw is actually missing in my adata_cell. Could you please help clarify it and also provide some suggestions on dealing with Seurat object? Much appreciated!

RuiqiaoHe commented 7 months ago

If you have to save filtered cell matrix from Seurat, I suggest that you set Xraw=False, since adata_cell.raw is actually missing. The reason for setting Xraw=True is to make sure that sEV-characterized genes are not filtered out in the preprocessing and filtering steps of the cell matrix. If you can save all the gene expression of the cell matrix during conversion, I encourage you to save them and set Xraw=True. You can refer the point 7 in the Troubleshooting on how to skip optional steps. Thanks for your advice, and I will clarify it in the updated tutorial.

youngjp0829 commented 7 months ago

If you have to save filtered cell matrix from Seurat, I suggest that you set Xraw=False, since adata_cell.raw is actually missing. The reason for setting Xraw=True is to make sure that sEV-characterized genes are not filtered out in the preprocessing and filtering steps of the cell matrix. If you can save all the gene expression of the cell matrix during conversion, I encourage you to save them and set Xraw=True. You can refer the point 7 in the Troubleshooting on how to skip optional steps. Thanks for your advice, and I will clarify it in the updated tutorial.

Thank you for your prompt response. The ESAI_calculator works now. Just want to bring out another issue. I worked with mouse samples and got an error mentioning no gene enrichment when counting sEV biogenesis capacity using gseapy. Then I realized the genes in gmt file are all human gene symbols. After using gmt file including mouse gene symbols, it works fine.

RuiqiaoHe commented 7 months ago

Thanks for your advice. I will add this to the Troubleshooting list.

Yujj1123 commented 4 months ago

If you have to save filtered cell matrix from Seurat, I suggest that you set Xraw=False, since adata_cell.raw is actually missing. The reason for setting Xraw=True is to make sure that sEV-characterized genes are not filtered out in the preprocessing and filtering steps of the cell matrix. If you can save all the gene expression of the cell matrix during conversion, I encourage you to save them and set Xraw=True. You can refer the point 7 in the Troubleshooting on how to skip optional steps. Thanks for your advice, and I will clarify it in the updated tutorial.

Thank you for your prompt response. The ESAI_calculator works now. Just want to bring out another issue. I worked with mouse samples and got an error mentioning no gene enrichment when counting sEV biogenesis capacity using gseapy. Then I realized the genes in gmt file are all human gene symbols. After using gmt file including mouse gene symbols, it works fine.

Hi! I'm trying to use the ESAI_calculator function, and I encountered an error.

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning.Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour. [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas], /opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm 2024-04-25 11:25:32,528 [WARNING] Duplicated values found in preranked stats: 12.58% of genes The order of those genes will be arbitrary, which may produce unexpected results.

2024-04-25 11:25:32,535 [ERROR] No gene sets passed through filtering condition !!! 
Hint 1: Try to lower min_size or increase max_size !
Hint 2: Check gene symbols are identifiable to your gmt input.
Hint 3: Gene symbols curated in Enrichr web services are all upcases.
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3434/4272367091.py in <module>
      1 import SEVtras
----> 2 SEVtras.ESAI_calculator(adata_ev_path='/opt/conda/Zhoubo/raw_data/outputs/sEV_SEVtras.h5ad', adata_cell_path='./adata.h5ad', out_path='./ESAI.outputs', Xraw=False, OBSsample='batch', OBScelltype='celltype')

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/main.py in ESAI_calculator(adata_ev_path, adata_cell_path, out_path, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW, plot_cmp, save_plot_prefix, OBSMumap, size)
    185     adata_cell = read_adata(adata_cell_path, get_only=False)
    186     from .functional import deconvolver, ESAI_celltype, plot_SEVumap, plot_ESAIumap
--> 187     celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
    188     ##ESAI for sample
    189     sample_ESAI = (adata_com[adata_com.obs[OBScelltype]==OBSev,].obs[OBSsample].value_counts() / adata_com[adata_com.obs[OBScelltype]!=OBSev,].obs[OBSsample].value_counts()).fillna(0)

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/functional.py in deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
    113 
    114     adata_combined = preprocess_source(adata_ev, adata_cell, OBScelltype=OBScelltype, OBSev=OBSev, Xraw = Xraw)
--> 115     gsea_pval_dat = source_biogenesis(adata_cell, OBScelltype=OBScelltype, Xraw = Xraw, normalW=normalW)
    116     near_neighbor_dat = near_neighbor(adata_combined, OBSsample=OBSsample, OBSev=OBSev, OBScelltype=OBScelltype, OBSMpca=OBSMpca, cellN=cellN)
    117 

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/functional.py in source_biogenesis(adata_cell, OBScelltype, Xraw, normalW)
     37         gene_rank = pd.DataFrame({'exp': np.array(X_norm[adata_cell.obs[OBScelltype] == str(i), :].mean(axis=0))}, index = X_input.var_names)
     38 
---> 39         res = gp.prerank(rnk=gene_rank, gene_sets=gmt_path)
     40         terms = res.results.keys()
     41         gsea_pval.append([i, res.results[list(terms)[0]]['nes'], res.results[list(terms)[0]]['pval']])

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/gseapy/__init__.py in prerank(rnk, gene_sets, outdir, pheno_pos, pheno_neg, min_size, max_size, permutation_num, weighted_score_type, ascending, threads, figsize, format, graph_num, no_plot, seed, verbose, *arg, **kwarg)
    356         verbose,
    357     )
--> 358     pre.run()
    359     return pre
    360 

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/gseapy/gsea.py in run(self)
    433         self._logger.info("Parsing data files for GSEA.............................")
    434         # filtering out gene sets and build gene sets dictionary
--> 435         gmt = self.load_gmt(gene_list=dat2.index.values, gmt=self.gene_sets)
    436         self.gmt = gmt
    437         self._logger.info(

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/gseapy/base.py in load_gmt(self, gene_list, gmt)
    234             )
    235             self._logger.error(msg)
--> 236             dict_head = "{ %s: [%s]}" % (subsets[0], genesets_dict[subsets[0]])
    237             self._logger.error(
    238                 "The first entry of your gene_sets (gmt) look like this : %s"

KeyError: 'EVs_BIOGENESIS'`

Since I'm working with a mouse dataset, I suspect the error might be due to the use of a human GMT file. As I'm a beginner in Python, and I haven't been able to identify where to replace the GMT file in the parameters or the source code of the ESAI_calculator function. Could you please advise me on how to use a mouse GMT file instead? Thank you!

RuiqiaoHe commented 4 months ago

It seems to be true. I have only tested ESAI_calculator in human samples. This problem has been solved at v0.2.10. You can run ESAI_calculator with parameter species='Mus'.

filmchen commented 3 months ago

It seems to be true. I have only tested ESAI_calculator in human samples. This problem has been solved at v0.2.10. You can run ESAI_calculator with parameter species='Mus'.

Hi! After updating the package to version0.2.10, I still encounter an Error: [Errno 2] No such file or directory: '/.conda/envs/sevtras_env/lib/python3.7/site-packages/SEVtras/evsM.gmt' when running on mouse sample. Then I checked the path and there was indeed no such file, only an evs.gmt. Could you give me some suggestions to solve this problem? It would be appreciated for your reply.

RuiqiaoHe commented 3 months ago

Hi! After updating the package to version0.2.10, I still encounter an Error: [Errno 2] No such file or directory: '/.conda/envs/sevtras_env/lib/python3.7/site-packages/SEVtras/evsM.gmt' when running on mouse sample. Then I checked the path and there was indeed no such file, only an evs.gmt. Could you give me some suggestions to solve this problem? It would be appreciated for your reply.

Thank you for your testing. The problem was fixed in v0.2.11. Could you please update to it and let me know if it is ok?

filmchen commented 3 months ago

Hi! After updating the package to version0.2.10, I still encounter an Error: [Errno 2] No such file or directory: '/.conda/envs/sevtras_env/lib/python3.7/site-packages/SEVtras/evsM.gmt' when running on mouse sample. Then I checked the path and there was indeed no such file, only an evs.gmt. Could you give me some suggestions to solve this problem? It would be appreciated for your reply.

Thank you for your testing. The problem was fixed in v0.2.11. Could you please update to it and let me know if it is ok?

Thank you for providing the solution, it runs well !