digitalcytometry / ecotyper

EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Other
177 stars 41 forks source link

tutorial 3 errors unless "Recovery cell type fractions" specified #77

Closed istvankleijn closed 3 months ago

istvankleijn commented 11 months ago

When trying to get tutorial 3 to work, I ran into some errors when I left Recovery cell type fractions set to "NULL". It appeared to have problems finding files similar to https://github.com/digitalcytometry/ecotyper/issues/49. Below I include the YAML file and the output with errors.

The script ran successfully when I specified Recovery cell type fractions : "example_data/visium_fractions_example.txt". I would like to explore the carcinoma ecotypes and cell types in my own dataset though, so I am wondering if there is another way to generate this file manually?

YAML:

default :
  Input :
    Discovery dataset name : "Carcinoma"
    Recovery dataset name : "VisiumBreast"
    Input Visium directory : "example_data/VisiumBreast"
    Recovery cell type fractions : "NULL"
    Background cell type : "Epithelial.cells"
    CIBERSORTx username : [redacted]
    CIBERSORTx token : [redacted]

  Output :
    Output folder : "/home/ikleijn/bench/EcoTyper/tutorials/3"

  Pipeline settings :
    Number of threads : 4
    CIBERSORTx fractions Singularity path : "/home/ikleijn/bench/EcoTyper/cibersortx_fractions.sif"

Output:

Loading visium data...

Running CIBERSORTxFractions on the visium dataset...
Warning: Running CIBERSORTx on the subset of ST spots that contain expression for at least one gene in the signature matrix. This is necessary to prevent matrix singularity issues observed when running CIBERSORTx on very sparse data, as is the case with ST arrays!
Running CIBERSORTx fractions with B-mode batch correction, using the 'LM22' signature matrix and 'VisiumBreast' dataset.
Running on singularity...
>Running CIBERSORTxFractions...
>[Options] username: [redacted]
>[Options] token: [redacted]
>[Options] mixture: /src/data/mixture.txt
>[Options] sigmatrix: /src/data/sigmatrix.txt
>[Options] sourceGEPs: /src/data/sourceGEPs.txt
>[Options] rmbatchBmode: TRUE
>[Options] verbose: TRUE
>=============CIBERSORTx Settings===============
>Mixture file: /src/data/mixture.txt
>Signature matrix file: /src/data/sigmatrix.txt
>Enable verbose output
>Do B-mode batch correction
>==================CIBERSORTx===================
WARNING: ignoring environment value of R_HOME
Error in library(e1071) : there is no package called ‘e1071’
Execution halted
>Batch correction:.
WARNING: ignoring environment value of R_HOME
Error in library(e1071) : there is no package called ‘e1071’
Execution halted
>Run CIBERSORTx on B-mode batch corrected mixtures.
>=============CIBERSORTx Settings===============
>Mixture file: /src/outdir//CIBERSORTx_Mixtures_Adjusted.txt
>Signature matrix file: /src/data/sigmatrix.txt
>Enable verbose output
>==================CIBERSORTx===================
ERROR: Could not read /src/outdir//CIBERSORTx_Mixtures_Adjusted.txt
Segmentation fault
Error:
Execution halted
Error:
Execution halted
Error in RunJobQueue() :
  EcoTyper failed. Please check the error message above!
Execution halted
WangKang-Leo commented 11 months ago

When trying to get tutorial 3 to work, I ran into some errors when I left Recovery cell type fractions set to "NULL". It appeared to have problems finding files similar to #49. Below I include the YAML file and the output with errors.

The script ran successfully when I specified Recovery cell type fractions : "example_data/visium_fractions_example.txt". I would like to explore the carcinoma ecotypes and cell types in my own dataset though, so I am wondering if there is another way to generate this file manually?

YAML:

default :
  Input :
    Discovery dataset name : "Carcinoma"
    Recovery dataset name : "VisiumBreast"
    Input Visium directory : "example_data/VisiumBreast"
    Recovery cell type fractions : "NULL"
    Background cell type : "Epithelial.cells"
    CIBERSORTx username : [redacted]
    CIBERSORTx token : [redacted]

  Output :
    Output folder : "/home/ikleijn/bench/EcoTyper/tutorials/3"

  Pipeline settings :
    Number of threads : 4
    CIBERSORTx fractions Singularity path : "/home/ikleijn/bench/EcoTyper/cibersortx_fractions.sif"

Output:

Loading visium data...

Running CIBERSORTxFractions on the visium dataset...
Warning: Running CIBERSORTx on the subset of ST spots that contain expression for at least one gene in the signature matrix. This is necessary to prevent matrix singularity issues observed when running CIBERSORTx on very sparse data, as is the case with ST arrays!
Running CIBERSORTx fractions with B-mode batch correction, using the 'LM22' signature matrix and 'VisiumBreast' dataset.
Running on singularity...
>Running CIBERSORTxFractions...
>[Options] username: [redacted]
>[Options] token: [redacted]
>[Options] mixture: /src/data/mixture.txt
>[Options] sigmatrix: /src/data/sigmatrix.txt
>[Options] sourceGEPs: /src/data/sourceGEPs.txt
>[Options] rmbatchBmode: TRUE
>[Options] verbose: TRUE
>=============CIBERSORTx Settings===============
>Mixture file: /src/data/mixture.txt
>Signature matrix file: /src/data/sigmatrix.txt
>Enable verbose output
>Do B-mode batch correction
>==================CIBERSORTx===================
WARNING: ignoring environment value of R_HOME
Error in library(e1071) : there is no package called ‘e1071’
Execution halted
>Batch correction:.
WARNING: ignoring environment value of R_HOME
Error in library(e1071) : there is no package called ‘e1071’
Execution halted
>Run CIBERSORTx on B-mode batch corrected mixtures.
>=============CIBERSORTx Settings===============
>Mixture file: /src/outdir//CIBERSORTx_Mixtures_Adjusted.txt
>Signature matrix file: /src/data/sigmatrix.txt
>Enable verbose output
>==================CIBERSORTx===================
ERROR: Could not read /src/outdir//CIBERSORTx_Mixtures_Adjusted.txt
Segmentation fault
Error:
Execution halted
Error:
Execution halted
Error in RunJobQueue() :
  EcoTyper failed. Please check the error message above!
Execution halted

Very appreciated if you can send me (kang.wang@ki.se) two .sif files, I am going to run Tutorial4 in our cluster.

BALuca commented 11 months ago

Hi Istvan,

Unfortunately, it is very hard to debug issues with the CIBERSORTx container, since it's a separate software. You could try to generate the fractions using the CIBERSORTx website, by applying the LM22 and TR4 signatures separately and then merge them as described here. Then you can provide the fractions to EcoTyper, as you did with the example data.

Best, The EcoTyper team