digitalcytometry / cytotrace2

CytoTRACE 2 is an interpretable AI method for predicting cellular potency and absolute developmental potential from scRNA-seq data.
Other
60 stars 3 forks source link

Error when running Cytotrace2 in Python #4

Closed ttszen closed 3 months ago

ttszen commented 3 months ago

Hello! Thank you for the earlier help with installing Cytotrace2 in Python. I've managed to get the installation done and run the software in a Python script as follows. I've provided the input file as recommended in the documentation.

from cytotrace2_py.cytotrace2_py import *

input_path = "/analysis_temp_files/cytotrace_input_240404.txt"
species = "human"

results =  cytotrace2(input_path, species=species)

This runs fine and the intermediate files (binned_df_0.txt, ranked_df_0.txt, top_var_genes_0.txt) are produced in the output directory. However, after this step, there is an error as follows.

Error in `CreateAssayObject()`:
! Either 'counts' or 'data' must be missing; both cannot be provided
Traceback (most recent call last):
  File "/software/team215/users/st25/miniconda3/envs/cytotrace2-py/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/software/team215/users/st25/miniconda3/envs/cytotrace2-py/lib/python3.9/site-packages/cytotrace2_py/cytotrace2_py.py", line 52, in process_subset
    out = subprocess.run(['Rscript', run_script, '--output-dir', output_dir, '--suffix', suffix, '--max-pcs', str(max_pcs), '--seed', str(seed)], check=True)
  File "/software/team215/users/st25/miniconda3/envs/cytotrace2-py/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['Rscript', '/software/team215/users/st25/miniconda3/envs/cytotrace2-py/lib/python3.9/site-packages/cytotrace2_py/resources/smoothDatakNN.R', '--output-dir', 'cytotrace2_results', '--suffix', '_0', '--max-pcs', '200', '--seed', '14']' returned non-zero exit status 1.

I'd much appreciate some of your thoughts; I was wondering what is going on and potential ways to solve this error?

savagyan00 commented 3 months ago

Hi and thank you for reaching out. We are sorry you are experiencing these issues, and we'll be glad to help!

The issue seems to be associated with a function of the Seurat package, which, given the version in the conda environment we provide, shouldn't cause any issues. Could you please run conda list seurat in the same environment you ran the Python script, and simply paste the output here?

ttszen commented 3 months ago

Thanks so much for getting back to me and for the help! Here is the output as requested. This appears to be consistent with the .yml file for the Python environment.

# packages in environment at /software/team215/users/st25/miniconda3/envs/cytotrace2-py:
#
# Name                    Version                   Build  Channel
r-seurat                  4.3.0.1           r42ha503ecb_0    conda-forge
r-seuratobject            5.0.1             r42ha503ecb_0    conda-forge

I had a look at the line of code that may be causing this error. I looked at the smoothDataKNN.R file in the Python version (https://github.com/digitalcytometry/cytotrace2/blob/main/cytotrace2_python/cytotrace2_py/resources/smoothDatakNN.R) and line 45 doesn't appear to be the same as the equivalent (line 61) in the postprocessing.Rfile in the R version (https://github.com/digitalcytometry/cytotrace2/blob/main/cytotrace2_r/R/postprocessing.R).

Do you think that may be causing some of these problems?

savagyan00 commented 3 months ago

Thank you for bringing this to our attention and providing such detailed insights!

The conda environment we had provided for Python has Seurat 4.3.0 which should work with Python 3.9.0 with no issues with the given codebase, but it looks like, in your version of conda, its approach to expanding package versions to their fullest form available was installing Seurat 4.3.0.1 into your environment, leading to subtle differences in package behavior.

To address the concerns you’ve raised, we’ve taken steps to align the versions of Seurat (and associated dependencies) in both the R and Python environments, update the Python version to be compatible with the newer Seurat version, and, for clarity, also have the Python pipeline use the same functions of Seurat as the R postprocessing workflow scripts.

All the mentioned changes can be seen in the latest version of the repository. Let us know if this works smoothly and if we can help further!

ttszen commented 3 months ago

No worries and thank you so much for all the help! I've cloned the repo and re-installed the Python version of CytoTRACE2 now. I've managed to get the results now.

I really appreciate the help and all the support. Looking forward to exploring more of my data with CytoTRACE2!