Open TATABOX99 opened 4 weeks ago
The parameter sample_file
should be a file containing all samples, formatted as follows:
sample1
sample2
......
sampleN
In the case of your dataset, it would be: Sc1_LN Sc2_LN Sc3_LN ...... ScN_LN
For the "max() arg" error, please first check if your dataset was generated by scRNA-seq? SEVtras doesn't support single nucleus RNA-seq data. Then, you can try to lower the parameter alpha
.
Thank you very much for your prompt response. The correct import command allowed the program to run successfully. Additionally, I saw in the guide that you suggest using multiple sample data to identify SEVs. Can I merge all my samples into one Seurat object, convert it to h5ad format, and then analyze it? If so, would clustering the cells in my object before using SEVtras affect the determination of the SEV sources?
By the way, I found that analyzing a single sample theoretically takes an entire day. Is this normal for this software? My RAM is 256G, with CPU core is 20.
Please refer to Question 9 in Troubleshooting. You can obtain the cell type information based on your own processing procedure, such as filtering or regressing. SEVtras only needs the cell type information, instead of the processed expression matrix.
In my environment (naive Linux), SEVtras takes only a few minutes to finish a 10X single cell sample. You can try to speed it up by increasing the number of CPUs if the memory load is sufficient (predefine_threads
).
Here I have an additional question. When perform data quality control, I use this code to remove poor-quality cells: test.seu <- subset(test.seu, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 5). Will this step affect the identification of SEV (Single Extracellular Vesicles)?
The identification of sEVs is a separate step without the information of the cell matrix (SEVtras.sEV_recognizer). Cell matrix pre-processing only affects the calculation of ESAI (SEVtras.ESAI_calculator).
I have encountered some new difficulties. After completing the calculations in part 1, I obtained 15 "itera_gene.txt" and "raw_file.h5ad" files. Before running part 2, I noticed that you mentioned, "The first two parameters represent the path to sEV- and cell- anndata objects" and "With the output of SEVtras.sEV_recognizer in Part I sEVs recognizing and cell matrix with cell type, SEVtras can track each sEV to the original cell type and calculate the sEV secretion activity index (ESAI)."
However, the data I included in the first step is single-cell sequencing data directly from patients' original samples, without any clustering, meaning I do not have a cell matrix with cell type, and there is no "celltype" metadata. I would like to ask if this means I should first merge the raw_file.h5ad obtained from the first step into a Seurat object, process and cluster it, and then convert it back to an h5ad object to get the "test_cell.h5ad" file mentioned in "SEVtras.ESAI_calculator(adata_ev_path='./tests/sEV_SEVtras.h5ad', adata_cell_path='./tests/test_cell.h5ad', out_path='./outputs', Xraw=False, OBSsample='batch', OBScelltype='celltype')". But where does the "sEV_SEVtras.h5ad" file come from?
"sEV_SEVtras.h5ad" generated by SEVtras.sEV_recognizer, located in outputs directory.
Dear esteemed author,
In your guide, regarding the second part, you mentioned, "With the output of SEVtras.sEV_recognizer in Part I sEVs recognizing and cell matrix with cell type, SEVtras can track each sEV to original cell type and calculate sEV secretion activity index (ESAI)." Could you clarify whether the "cell matrix with cell type" referred to here should be constructed using the filtered feature bc matrix from single-cell data output or if it would be more appropriate to use the raw feature bc matrix to construct an object containing cell clustering information?
Thank you for your guidance.
Please refer to question 9 in Troubleshooting. A raw feature bc matrix with cell clustering information is preferred, regardless of how you generate the cell clustering information.
I am encountering an issue while running the SEVtras.ESAI_calculator function. I used a processed Seurat object which I converted to H5AD format as the tutorial in Troubleshooting. However, when I run the SEVtras.ESAI_calculator function, I encounter the following error:
/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...)
to get the future behaviour.
[AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/scanpy/preprocessing/_pca.py:229: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...)
to get the future behaviour.
adata.obsm['X_pca'] = X_pca
No matched cell sample for sEV deconvolution
... (repeated multiple times) ...
Traceback (most recent call last):
File "ESAI_calculator.py", line 2, in
Since I am a beginner in Python, I cannot understand what went wrong with the converted data. Additionally, it doesn't seem to be a problem with the samples, as I have confirmed that both adata_ev and adata_cell use the same samples. Below is the code I used in Python to convert a Seurat object to H5AD format. I greatly need your help, thank you!
import scanpy as sc import anndata from scipy import io from scipy.sparse import csr_matrix import pandas as pd
directory = '/home/yeziyang/Sc/'
X = io.mmread(directory + 'matrix.mtx') adata = anndata.AnnData(X=X.transpose().tocsr())
metadata = pd.read_csv(directory + 'metadata.csv')
with open(directory + 'gene_names.csv', 'r') as f: gene_names = f.read().splitlines()
adata.obs = metadata adata.obs.index = adata.obs["barcode"] adata.var.index = gene_names
adata.obs['celltype'] = pd.Categorical(adata.obs['celltype'].astype(str))
adata.write_h5ad(directory + 'PTCALL_RAW.h5ad')
adata_ev 'obs' columns: Index(['n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'score', 'batch', 'sEV', 'score1'], dtype='object')
adata_cell 'obs' columns: Index(['orig.ident', 'nCount_RNA', 'nFeature_RNA', 'barcode', 'percent.mt', 'patient', 'celltype', 'tissue_type', 'RNA_snn_res.1', 'seurat_clusters'], dtype='object') It looks like my adata_ev does not have a celltype obs. Is this normal? Did I overlook any important steps that caused the inconsistency between the two datasets?
Sorry for my fault. Please use following code:
import scanpy as sc
adata_cell = sc.read_h5ad(directory + 'PTCALL_RAW.h5ad')
adata_cell.obs['batch'] = adata_cell.obs['orig.ident']
Since SEVtras gets sample information based on the OBSsample
parameter, the default is batch.
Thank you very much for your suggestion. That part of the program is now running normally. However, an error occurred during the downstream analysis when converting to a Seurat object: Convert("SEVtras_combined.h5ad", dest = "h5seurat", overwrite = T) Warning: Unknown file type: h5ad Warning: 'assay' not set, setting to 'RNA' Creating h5Seurat file for version 3.1.5.9900 Adding X as data Adding raw/X as counts Adding meta.features from raw/var Adding X_pca as cell embeddings for pca Adding X_umap as cell embeddings for umap Adding PCs as feature loadings for pca Warning: Cannot find features for feature loadings, will not be able to load Adding miscellaneous information for pca Adding standard deviations for pca Adding miscellaneous information for umap Adding hvg to miscellaneous data Adding log1p to miscellaneous data
scdata <- LoadH5Seurat("SEVtras_combined.h5seurat", meta.data = T, misc = F) Validating h5Seurat file Initializing RNA with data Error in fixupDN.if.valid(value, x@Dim) : length of Dimnames[[1]] (21203) is not equal to Dim[1] (4542)
This seems to indicate an error in generating the data in h5ad format. I wonder if you have encountered this error before. I would greatly appreciate any further advice you could provide.
Please refer to question 13 at Troubleshooting html. The code would reliably convert Seurat object to h5ad format.
Dear Author,
Upon investigation, we found that the error originates from the version issue of the Seurat-disk we are using. It seems feasible to extract information from this h5ad format dataset directly in Python and then construct the Seurat object directly in R. However, we found that in the newly constructed Seurat object, the number of features and cells changed. Below is the information of the Seurat object we initially used:
testAB.integrated An object of class Seurat 31317 features across 161079 samples within 1 assay Active assay: RNA (31317 features, 5000 variable features) 3 layers present: counts, data, scale.data 4 dimensional reductions calculated: pca, harmony, umap, tsne
After processing with SEVtras, the number of features decreased to 4542, while the number of cells increased to 192177. We have two questions: first, is this a normal phenomenon that could occur during the computation process? Second, can we extract the EVs determination information in the metadata of the newly generated Seurat object and transfer it to the initial Seurat object, 'testAB.integrated', so that we can retain the EVs information while also preserving the feature information?
Since I don't know the full picture of your data, I can only answer with what I assume to be the case.
adata_cell
of SEVtras.ESAI_calculator. So it won't increase. If you indicate the cell number in the output of SEVtras.sEV_recognizer (raw_SEVtras.h5ad), this is not the real cell in your analysis. I only control the total UMI in the file. As for the number of features, it can rise from different feature selection parameters between Seurat and Scanpy.
Dear Author,
I am using this script for analysis: import SEVtras SEVtras.sEV_recognizer(input_path='/home/yeziyang/Sc', sample_file='/home/yeziyang/Sc/Sc1_LN', out_path='/home/yeziyang/Sc/outputs', species='Homo')
My 10x_mtx formatted data is stored in this directory: /home/yeziyang/Sc/Sc1_LN/outs/raw_feature_bc_matrix/matrix.mtx.gz. I have ensured it is extracted from the raw_feature_bc_matrix. However, I encountered the following error: File "run_SEVtras.py", line 2, in
SEVtras.sEV_recognizer(input_path='/home/yeziyang/Sc', sample_file='/home/yeziyang/Sc/Sc1_LN', out_path='/home/yeziyang/Sc/outputs', species='Homo')
File "/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py", line 79, in sEV_recognizer
sample_log = get_sample(sample_file)
File "/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/utils.py", line 18, in get_sample
with open(sample_log, 'r') as f:
IsADirectoryError: [Errno 21] Is a directory: '/home/yeziyang/Sc/Sc1_LN'
I am not sure why this error occurs. Could you please give me some advice? Thank you very much!
Additionally, when I used echo "Sc1_LN" > sample_file and ran the above code again, setting alpha to 0.08, it showed max() arg is an empty sequence. I am not sure if this is due to an error in my import process.