bioinfo-biols / SEVtras

sEV-containing droplet identification in scRNA-seq data (SEVtras)
GNU Affero General Public License v3.0
20 stars 5 forks source link

Error in SEVtras.sEV_recognizer #17

Open Yujj1123 opened 7 months ago

Yujj1123 commented 7 months ago

Thank you for developing the algorithm! As I primarily work with R, I'm not as proficient in Python, which has led to some challenges while trying to utilize your software for processing my data. I'm reaching out in the hope of receiving some guidance on how to resolve an issue I encountered at the outset. When I ran

import SEVtras
SEVtras.sEV_recognizer(sample_file='/opt/conda/Zhoubo/SRR13005718/SRR13005718/outs/raw_feature_bc_matrix/matrix.mtx.gz', out_path='/opt/conda/Zhoubo/SRR13005718', species='Mus')

Unfortunately, I met the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/main.py", line 79, in sEV_recognizer
    sample_log = get_sample(sample_file)
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/utils.py", line 19, in get_sample
    for line in f.readlines():
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I'm uncertain whether this issue stems from how I've set the file paths or if there's another underlying cause. I would greatly appreciate any insights or suggestions you might have on resolving this error. Thank you very much for your time and assistance.

RuiqiaoHe commented 7 months ago

Wrong sample_file was entered for SEVtras. See here for how to use sample_file parameter. If you only have one sample, please refer to question6 in the Troubleshooting.

Yujj1123 commented 7 months ago

Wrong sample_file was entered for SEVtras. See here for how to use sample_file parameter. If you only have one sample, please refer to question6 in the Troubleshooting.

Thank you for your quick response! After modifying the content of the sample_file, I ran the command in the background using nohup:

cat > sample_file
/opt/conda/Zhoubo/SRR13005718/SRR13005718/
/opt/conda/Zhoubo/SRR13005719/SRR13005719/
^C
vim run_sevtras.py
import SEVtras
SEVtras.sEV_recognizer(sample_file='./sample_file', out_path='./SEV.zhoubo.outputs', species='Mus')
:wq
nohup python run_sevtras.py > SEVtras.zhoubo.txt 2>&1 &

However, after nearly 14 hours, the only output in my log is:

/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:140: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  adata.obs['n_genes'] = number

I used htop to check on the process and it seems that the Python script is still running in the background. image After reviewing issues #10 and #13, it seems that my runtime is not normal. I am running this in a Linux environment with 48 vCPUs and 96 GiB, and my samples are from 10x genomics. I would appreciate any assistance you can provide. Thank you very much!

RuiqiaoHe commented 7 months ago

This is unusual for SEVtras. Would your "Linux environment" be a virtual one under Windows? If not, could you please provide me with the raw_feature_bc_matrix files. I will test it in my environment.

Yujj1123 commented 7 months ago

I am operating on a Linux cloud server. How should I send you the files? Via email or Baidu Netdisk?

SayaGoodBye commented 7 months ago

@RuiqiaoHe Hello, thank you very much for your algorithm! But I think I'm having a similar problem as Yujj1123. When I was testing, using 24 threads and calculating 3 samples with the parameters "search_UMI=500, alpha=0.1", I ended up with the file sEV_SEVtras.h5ad but it didn't have any sEVs in it.

I thought it was something like 'ValueError: max() arg is an empty sequence', so I changed the parameters to "search_UMI=250, alpha=0.05" and calculated 4 samples but got the same result. I checked the log file generated by both processes, and the only thing in there is

'/home/XXX/micromamba/envs/main/lib/python3.10/site-packages/SEVtras/utils.py:135: FutureWarning: Calling int on a single element Series is deprecated and will raise a TypeError in the future. Use int(ser.iloc[0]) instead
  N = int(env.Inter_adata[i, :].obs['n_genes'])'

for this output. So how should I tweak the samples, or change the parameters, in order to get the correct results? Looking forward to your reply.

This is unusual for SEVtras. Would your "Linux environment" be a virtual one under Windows? If not, could you please provide me with the raw_feature_bc_matrix files. I will test it in my environment.

RuiqiaoHe commented 7 months ago

You have finished SEVtras.sEV_recognizer, but no sEVs can be found. There are two things I need to check from you.

  1. The input droplet-gene matrix for SEVtras should be the raw data; here the matrix should come from the raw_feature_bc_matrix directory in Cell Ranger outs.
  2. Are your data generated by scRNA-seq? SEVtras doesn't support single nucleus RNA-seq data.
SayaGoodBye commented 7 months ago

You have finished SEVtras.sEV_recognizer, but no sEVs can be found. There are two things I need to check from you.

  1. The input droplet-gene matrix for SEVtras should be the raw data; here the matrix should come from the raw_feature_bc_matrix directory in Cell Ranger outs.
  2. Are your data generated by scRNA-seq? SEVtras doesn't support single nucleus RNA-seq data.

Thank you for your answer! It has solved my trouble for several days. In fact, I did use snRNA data to test SEVtras, but I am a novice in bioinformatics analysis and do not have a deep understanding of the principles of SEVtras and snRNA. Next, I will need to understand why SEVtras does not support snRNA data. They are both data obtained by 10X technology, but there are such differences. Perhaps it is because the principle of snRNA is different from that of scRNA? I think, as the developer of SEVtras, you know far more about this than I do. Advices from a knowledgeable person can better promote the novice's understanding of these knowledge. Could you please recommend some relevant articles to me, or could you tell me which parts of the code of SEVtras will be affected by the differences between scRNA and snRNA? I would appreciate it!

RuiqiaoHe commented 6 months ago

This is due to the experimental procedure of snRNA-seq. Small extracellular vesicles (sEVs) would be almost filtered out during the nuclei isolation and extraction process, so our software is not suitable for analyzing this type of data (snRNA-seq).