bioinfo-biols / SEVtras

sEV-containing droplet identification in scRNA-seq data (SEVtras)
GNU Affero General Public License v3.0
20 stars 6 forks source link

IndexError while running sEVtras calculator #30

Open songnsnow opened 2 weeks ago

songnsnow commented 2 weeks ago

Hi, I've been trying to run the second part (SEVtras.ESAI_calculator) of the tool, but I am getting the error below.

/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py:81: FutureWarning: Use anndata.concat instead of AnnData.concatenate, AnnData.concatenate is deprecated and will be removed in the future. See the tutorial for concat at: https://anndata.readthedocs.io/en/latest/concatenation.html
  adata_combined = adata_cell_raw.concatenate(adata_ev, batch_key = OBSev)
/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/scanpy/preprocessing/_pca.py:317: ImplicitModificationWarning: Setting element `.obsm['X_pca']` of view, initializing view as actual.
  adata.obsm[key_obsm] = X_pca
/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py:126: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  near_neighbor_dat.loc[i, 'times'] = tmp_times[0]
Traceback (most recent call last):
  File "/project/kleinman/seol.song/from_hydra/SEVtras/scripts/hgg/sevtras_calc.py", line 25, in <module>
    SEVtras.ESAI_calculator(
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/main.py", line 188, in ESAI_calculator
    celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, species, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py", line 126, in deconvolver
    near_neighbor_dat.loc[i, 'times'] = tmp_times[0]
                                        ~~~~~~~~~^^^
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/pandas/core/series.py", line 1118, in __getitem__
    return self._values[key]
           ~~~~~~~~~~~~^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

I looked into the deconvolver function, and I believe the error is caused by empty tmp_times Series. I think some of the rows (cells) in near_neighbor_dat are not tracing back to a celltype value, which might be causing the error.

Also, I noticed that the X_pca and X_umap values of adata_cell is not being utilized at least up until the deconvolver function, since adata_combined is being reprocessed with scanpy downstream analysis. However, I would like to use the X_pca and X_umap values that I imported from my analysis through Seurat.

Here is my code used for converting my SeuratObject to h5ad. I used the code provided by the SEVtras tutorial (there was a code for exporting the PCA and UMAP values from Seurat, there was no code to input the values into the scanpy h5ad object, so I wrote my own for importing PCA and UMAP values).

# Load 
import scanpy as sc
import anndata
from scipy import io
from scipy.sparse import coo_matrix, csr_matrix
import os
import pandas as pd

# Provided by SEVtras
save_path = '/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/data/'
X = io.mmread(os.path.join(save_path, "hgg_processed_matrix.mtx"))
adata = anndata.AnnData(X=X.transpose().tocsr())
metadata = pd.read_csv(os.path.join(save_path, "hgg_processed_metadata.csv"))

with open(os.path.join(save_path, "hgg_processed_gene_names.csv"),'r') as f:
    gene_names = f.read().splitlines()

adata.obs = metadata
adata.obs.index = adata.obs["barcode"]
adata.var.index = gene_names

# Load PCA csv
pca_df = pd.read_csv(os.path.join(save_path, "hgg_processed_pca.csv"), index_col=0)

# Add PCA and UMAP values to the AnnData object
adata.obsm['X_pca'] = pca_df.values
adata.obsm['X_umap'] = metadata[['UMAP_1', 'UMAP_2']].values

# Remove the 'barcode' column from adata.obs
adata.obs = adata.obs.drop(columns=['barcode'])

In conclusion, these are my final questions:

  1. What exactly is causing the IndexError?
  2. Can I use PCA and UMAP values from my Seurat analysis for running the code (+ drawing the UMAP)? This is crucial to me since I would like to reduce batch effects, which would require integrative analysis and altering of PCA and UMAP.

Thanks for your help in advance!

RuiqiaoHe commented 2 weeks ago

Could you please check the shape of adata_ev and adata_cell that you input for SEVtras.ESAI_calculator by following code? print(adata_ev) print(adata_cell) There may be a mismatch problem between the two adata objects.

songnsnow commented 2 weeks ago

Here is the output from print(adata_ev) and print(adata_cell).

>>> print(adata_ev)
AnnData object with n_obs × n_vars = 4383 × 24040
    obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'score', 'batch', 'sEV', 'score1'
>>> print(adata_cell)
AnnData object with n_obs × n_vars = 36076 × 31706
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'raw_batch_number', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'score', 'batch', 'sEV', 'score1', 'barcode_prefix', 'batch_number', 'cell.barcode', 'sample.id', 'sample.id.new', 'Paper_ID', 'Sample', 'percent.mito', 'percent.ribo', 'G1.S.score_Whitfield', 'G2.M.score_Whitfield', 'S.Score', 'G2M.Score', 'Phase', 'Paper_group', 'Paper_subgroup', 'Location', 'Age_group', 'Sex', 'Technology', 'Consensus_Jessa', 'Consensus_Pombo_Raw', 'CAWPE_Score_Pombo', 'Class', 'Celltype', 'celltype', 'Level2_Celltype', 'clst_itg', 'seurat_clusters', 'UMAP_1', 'UMAP_2'
    obsm: 'X_pca', 'X_umap'
RuiqiaoHe commented 2 weeks ago

At the moment I can't figure out what the bug is. Would it be convenient to share testing adata objects for me? I'll test it in my environment. The address is ruiqiaohe#gmail.com (change # to @).