aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

Error in tf_to_gene #439

Closed MatthewTCManion closed 1 month ago

MatthewTCManion commented 2 months ago

I am able to rune the SCENIC+ snakemake pipeline up until tf_to_gene, at which point it appears to run for every cell, but then his an unspecified error right after starting "Adding correlation coefficients to adjacencies". It doesn't appear to have any issue with the adjacencies for region_to_gene, so I'm not sure why it would fail here.

I have gotten this same failure multiple times with different resource allocations.

Error log:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job                count
---------------  -------
AUCell_direct          1
AUCell_extended        1
all                    1
eGRN_direct            1
eGRN_extended          1
region_to_gene         1
scplus_mudata          1
tf_to_gene             1
total                  8

Select jobs to execute...
Execute 1 jobs...

[Thu Jul 18 11:34:06 2024]
localrule region_to_gene:
    input: ACC_GEX.h5mu, search_space.tsv
    output: region_to_gene_adj.tsv
    jobid: 10
    reason: Missing output files: region_to_gene_adj.tsv
    threads: 20
    resources: tmpdir=/tmp

[Thu Jul 18 22:15:05 2024]
Finished job 10.
1 of 8 steps (12%) done
Select jobs to execute...
Execute 1 jobs...

[Thu Jul 18 22:15:05 2024]
localrule tf_to_gene:
    input: ACC_GEX.h5mu, tf_names.txt
    output: tf_to_gene_adj.tsv
    jobid: 5
    reason: Missing output files: tf_to_gene_adj.tsv
    threads: 20
    resources: tmpdir=/tmp

[Fri Jul 19 17:09:21 2024]
Error in rule tf_to_gene:
    jobid: 5
    input: ACC_GEX.h5mu, tf_names.txt
    output: tf_to_gene_adj.tsv
    shell:

        scenicplus grn_inference TF_to_gene             --multiome_mudata_fname ACC_GEX.h5mu             --tf_names tf_names.txt             --temp_dir /data/PetrosLab/Matt/scenicplus/consensus_peak_bulk_750bp/tmp/             --out_tf_to_gene_adjacencies tf_to_gene_adj.tsv             --method GBM             --n_cpu 20             --seed 666

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-07-18T113404.031923.snakemake.log
WorkflowError:
At least one job did not complete successfully.

image

Environment:

  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - anyio=4.4.0=pyhd8ed1ab_0
  - argon2-cffi=23.1.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py311h459d7ec_4
  - arrow=1.3.0=pyhd8ed1ab_0
  - asttokens=2.4.1=pyhd8ed1ab_0
  - async-lru=2.0.4=pyhd8ed1ab_0
  - attrs=23.2.0=pyh71513ae_0
  - babel=2.14.0=pyhd8ed1ab_0
  - beautifulsoup4=4.12.3=pyha770c72_0
  - bleach=6.1.0=pyhd8ed1ab_0
  - brotli-python=1.1.0=py311hb755f60_1
  - bzip2=1.0.8=h4bc722e_7
  - ca-certificates=2024.7.4=hbcca054_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cffi=1.16.0=py311hb3a22ac_0
  - charset-normalizer=3.3.2=pyhd8ed1ab_0
  - comm=0.2.2=pyhd8ed1ab_0
  - debugpy=1.8.2=py311h4332511_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - exceptiongroup=1.2.2=pyhd8ed1ab_0
  - executing=2.0.1=pyhd8ed1ab_0
  - fqdn=1.5.1=pyhd8ed1ab_0
  - h11=0.14.0=pyhd8ed1ab_0
  - h2=4.1.0=pyhd8ed1ab_0
  - hpack=4.0.0=pyh9f0ad1d_0
  - httpcore=1.0.5=pyhd8ed1ab_0
  - httpx=0.27.0=pyhd8ed1ab_0
  - hyperframe=6.0.1=pyhd8ed1ab_0
  - importlib_metadata=8.0.0=hd8ed1ab_0
  - importlib_resources=6.4.0=pyhd8ed1ab_0
  - isoduration=20.11.0=pyhd8ed1ab_0
  - jedi=0.19.1=pyhd8ed1ab_0
  - json5=0.9.25=pyhd8ed1ab_0
  - jsonpointer=3.0.0=py311h38be061_0
  - jsonschema-specifications=2023.12.1=pyhd8ed1ab_0
  - jsonschema-with-format-nongpl=4.23.0=hd8ed1ab_0
  - jupyter-lsp=2.2.5=pyhd8ed1ab_0
  - jupyter_client=8.6.2=pyhd8ed1ab_0
  - jupyter_core=5.7.2=py311h38be061_0
  - jupyter_events=0.10.0=pyhd8ed1ab_0
  - jupyter_server=2.14.2=pyhd8ed1ab_0
  - jupyter_server_terminals=0.5.3=pyhd8ed1ab_0
  - jupyterlab=4.2.3=pyhd8ed1ab_0
  - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1
  - jupyterlab_server=2.27.3=pyhd8ed1ab_0
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.21.3=h659f571_0
  - ld_impl_linux-64=2.40=hf3520f5_7
  - libedit=3.1.20191231=he28a2e2_2
  - libexpat=2.6.2=h59595ed_0
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=14.1.0=h77fa898_0
  - libgomp=14.1.0=h77fa898_0
  - libnsl=2.0.1=hd590300_0
  - libsodium=1.0.18=h36c2ea0_1
  - libsqlite=3.46.0=hde9e2c9_0
  - libstdcxx-ng=14.1.0=hc0a3c3a_0
  - libuuid=2.38.1=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libzlib=1.3.1=h4ab18f5_1
  - markupsafe=2.1.5=py311h459d7ec_0
  - mistune=3.0.2=pyhd8ed1ab_0
  - nbclient=0.10.0=pyhd8ed1ab_0
  - nbconvert-core=7.16.4=pyhd8ed1ab_1
  - ncurses=6.5=h59595ed_0
  - nest-asyncio=1.6.0=pyhd8ed1ab_0
  - notebook-shim=0.2.4=pyhd8ed1ab_0
  - openssl=3.3.1=h4bc722e_2
  - overrides=7.7.0=pyhd8ed1ab_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - pexpect=4.9.0=pyhd8ed1ab_0
  - pickleshare=0.7.5=py_1003
  - pip=24.0=pyhd8ed1ab_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
  - prometheus_client=0.20.0=pyhd8ed1ab_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pycparser=2.22=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - python=3.11.9=hb806964_0_cpython
  - python-fastjsonschema=2.20.0=pyhd8ed1ab_0
  - python-json-logger=2.0.7=pyhd8ed1ab_0
  - python_abi=3.11=4_cp311
  - pytz=2024.1=pyhd8ed1ab_0
  - pyyaml=6.0.1=py311h459d7ec_1
  - pyzmq=26.0.3=py311h08a0b41_0
  - readline=8.2=h8228510_1
  - rfc3339-validator=0.1.4=pyhd8ed1ab_0
  - rfc3986-validator=0.1.1=pyh9f0ad1d_0
  - send2trash=1.8.3=pyh0d859eb_0
  - setuptools=70.3.0=pyhd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - sniffio=1.3.1=pyhd8ed1ab_0
  - soupsieve=2.5=pyhd8ed1ab_1
  - stack_data=0.6.2=pyhd8ed1ab_0
  - terminado=0.18.1=pyh0d859eb_0
  - tinycss2=1.3.0=pyhd8ed1ab_0
  - tk=8.6.13=noxft_h4845f30_101
  - tomli=2.0.1=pyhd8ed1ab_0
  - types-python-dateutil=2.9.0.20240316=pyhd8ed1ab_0
  - typing_extensions=4.12.2=pyha770c72_0
  - typing_utils=0.1.0=pyhd8ed1ab_0
  - tzdata=2024a=h0c530f3_0
  - uri-template=1.3.0=pyhd8ed1ab_0
  - wcwidth=0.2.13=pyhd8ed1ab_0
  - webcolors=24.6.0=pyhd8ed1ab_0
  - webencodings=0.5.1=pyhd8ed1ab_2
  - websocket-client=1.8.0=pyhd8ed1ab_0
  - wheel=0.43.0=pyhd8ed1ab_1
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - zeromq=4.3.5=h75354e8_4
  - zstandard=0.23.0=py311h5cd10c7_0
  - zstd=1.5.6=ha6fb4c9_0
  - pip:
      - adjusttext==1.0.4
      - aiohttp==3.9.3
      - aiosignal==1.3.1
      - anndata==0.10.5.post1
      - annoy==1.17.3
      - appdirs==1.4.4
      - arboreto==0.1.6
      - argparse-dataclass==2.0.0
      - array-api-compat==1.5.1
      - attr==0.3.2
      - bbknn==1.6.0
      - bidict==0.23.1
      - bioservices==1.11.2
      - blosc2==2.5.1
      - bokeh==3.4.0
      - boltons==23.1.1
      - bs4==0.0.2
      - cattrs==23.2.3
      - certifi==2024.2.2
      - click==8.1.7
      - cloudpickle==3.0.0
      - colorama==0.4.6
      - colorlog==6.8.2
      - conda-inject==1.3.1
      - configargparse==1.7
      - connection-pool==0.0.3
      - contourpy==1.2.0
      - ctxcore==0.2.0
      - cycler==0.12.1
      - cython==0.29.37
      - cytoolz==0.12.3
      - dask==2024.5.0
      - dataclasses-json==0.6.4
      - datrie==0.8.2
      - dill==0.3.8
      - distributed==2024.2.1
      - docutils==0.20.1
      - dpath==2.1.6
      - easydev==0.13.1
      - et-xmlfile==1.1.0
      - fastjsonschema==2.19.1
      - fbpca==1.0
      - filelock==3.13.1
      - fonttools==4.50.0
      - frozendict==2.4.0
      - frozenlist==1.4.1
      - fsspec==2024.3.1
      - future==1.0.0
      - gensim==4.3.2
      - geosketch==1.2
      - gevent==24.2.1
      - gitdb==4.0.11
      - gitpython==3.1.42
      - globre==0.1.5
      - greenlet==3.0.3
      - grequests==0.7.0
      - gseapy==0.10.8
      - h5py==3.10.0
      - harmonypy==0.0.9
      - humanfriendly==10.0
      - idna==3.6
      - igraph==0.11.4
      - imageio==2.34.0
      - immutables==0.20
      - importlib-metadata==7.0.1
      - importlib-resources==6.1.2
      - interlap==0.2.7
      - intervaltree==3.1.0
      - ipykernel==7.0.0
      - ipython==8.22.2
      - ipywidgets==8.1.3
      - jinja2==3.1.3
      - joblib==1.3.2
      - jsonpickle==3.0.3
      - jsonschema==4.21.1
      - jupyterlab-widgets==3.0.11
      - kaleido==0.2.1
      - kiwisolver==1.4.5
      - lazy-loader==0.3
      - lda==3.0.0
      - leidenalg==0.10.2
      - line-profiler==4.1.2
      - llvmlite==0.42.0
      - locket==1.0.0
      - loompy==3.0.7
      - loomxpy==0.4.2
      - lxml==5.1.0
      - lz4==4.3.3
      - macs2==2.2.9.1
      - markdown-it-py==3.0.0
      - marshmallow==3.21.1
      - matplotlib==3.6.3
      - matplotlib-inline==0.1.6
      - mdurl==0.1.2
      - mizani==0.9.3
      - msgpack==1.0.8
      - mudata==0.2.3
      - multidict==6.0.5
      - multiprocessing-on-dill==3.5.0a4
      - mypy-extensions==1.0.0
      - natsort==8.4.0
      - nbformat==5.10.3
      - ncls==0.0.68
      - ndindex==1.8
      - networkx==3.2.1
      - numba==0.59.0
      - numexpr==2.9.0
      - numpy==1.26.4
      - numpy-groupies==0.10.2
      - openpyxl==3.1.2
      - packaging==24.0
      - pandas==1.5.0
      - parso==0.8.3
      - partd==1.4.1
      - patsy==0.5.6
      - pillow==10.2.0
      - plac==1.4.3
      - platformdirs==4.2.0
      - plotly==5.19.0
      - plotnine==0.12.4
      - polars==0.20.13
      - progressbar2==4.4.2
      - prompt-toolkit==3.0.43
      - protobuf==5.26.0
      - psutil==5.9.8
      - pulp==2.8.0
      - py-cpuinfo==9.0.0
      - pyarrow==15.0.0
      - pyarrow-hotfix==0.6
      - pybedtools==0.9.1
      - pybigtools==0.1.2
      - pybigwig==0.3.22
      - pybiomart==0.2.0
      - pycistarget==1.0a2
      - pycistopic==2.0a0
      - pyfasta==0.5.2
      - pygam==0.9.0
      - pygments==2.17.2
      - pynndescent==0.5.11
      - pyparsing==3.1.2
      - pyranges==0.0.111
      - pyrle==0.0.39
      - pysam==0.22.0
      - pyscenic==0.12.1+8.gd2309fe
      - python-dateutil==2.9.0.post0
      - python-utils==3.8.2
      - pyvis==0.3.2
      - ray==2.9.3
      - referencing==0.34.0
      - requests==2.31.0
      - requests-cache==1.2.0
      - reretry==0.11.8
      - rich==13.7.1
      - rich-argparse==1.4.0
      - rpds-py==0.18.0
      - scanorama==1.7.4
      - scanpy==1.8.2
      - scatac-fragment-tools==0.1.0
      - scenicplus==1.0a1
      - scikit-image==0.22.0
      - scikit-learn==1.3.2
      - scipy==1.12.0
      - scrublet==0.2.3
      - seaborn==0.13.2
      - sinfo==0.3.4
      - smart-open==6.4.0
      - smmap==5.0.1
      - snakemake==8.5.5
      - snakemake-interface-common==1.17.1
      - snakemake-interface-executor-plugins==8.2.0
      - snakemake-interface-report-plugins==1.0.0
      - snakemake-interface-storage-plugins==3.1.1
      - sorted-nearest==0.0.39
      - sortedcontainers==2.4.0
      - stack-data==0.6.3
      - statistics==1.0.3.5
      - statsmodels==0.14.1
      - stdlib-list==0.10.0
      - stopit==1.1.2
      - suds-community==1.1.2
      - tables==3.9.2
      - tabulate==0.9.0
      - tblib==3.0.0
      - tenacity==8.2.3
      - texttable==1.7.0
      - threadpoolctl==3.4.0
      - throttler==1.2.2
      - tifffile==2024.2.12
      - tmtoolkit==0.12.0
      - toolz==0.12.1
      - toposort==1.10
      - tornado==6.4
      - tqdm==4.66.2
      - traitlets==5.14.2
      - tspex==0.6.3
      - typing==3.7.4.3
      - typing-extensions==4.10.0
      - typing-inspect==0.9.0
      - umap-learn==0.5.5
      - url-normalize==1.4.3
      - urllib3==2.2.1
      - widgetsnbextension==4.0.11
      - wrapt==1.16.0
      - xlrd==2.0.1
      - xmltodict==0.13.0
      - xyzservices==2023.10.1
      - yarl==1.9.4
      - yte==1.5.4
      - zict==3.0.0
      - zipp==3.18.1
      - zope-event==5.0
      - zope-interface==6.2
SeppeDeWinter commented 2 months ago

Hi @MatthewTCManion

Could you try running the following command in the terminal?


scenicplus grn_inference TF_to_gene \
            --multiome_mudata_fname ACC_GEX.h5mu \
            --tf_names tf_names.txt \
            --temp_dir /data/PetrosLab/Matt/scenicplus/consensus_peak_bulk_750bp/tmp/ \
            --out_tf_to_gene_adjacencies tf_to_gene_adj.tsv \
            --method GBM  \
           --n_cpu 20  \
           --seed 666

Best,

Seppe

jklvrt commented 1 month ago

Hi Seppe, having this exact same error now! Upon running your suggested line I get the exact same error message as running the whole SCENIC+ (screenshot attached) Screenshot 2024-08-04 at 22 38 21 (EDIT: just realised this is slightly before in the pipeline (region_to_gene instead of tf_to_gene) (EDIT #2 (solution): downgrading to python 3.11.8 (was 3.11.9 prior) solved all these issues...) @SeppeDeWinter maybe worthwhile specifying the 3.11.8 and not just 3.11 in the tutorials? :)

MatthewTCManion commented 1 month ago

Hi Seppe, having this exact same error now! Upon running your suggested line I get the exact same error message as running the whole SCENIC+ (screenshot attached) Screenshot 2024-08-04 at 22 38 21 (EDIT: just realised this is slightly before in the pipeline (region_to_gene instead of tf_to_gene) (EDIT #2 (solution): downgrading to python 3.11.8 (was 3.11.9 prior) solved all these issues...) @SeppeDeWinter maybe worthwhile specifying the 3.11.8 and not just 3.11 in the tutorials? :)

I am running it now with the command above, but I had previously tried it with downgrading python 3.11.8 and gotten the same result

MatthewTCManion commented 1 month ago

@SeppeDeWinter Running the command without Snakemake, I get a segfault, but I can't tell why. Usually that's a resource allocation issue, but the memory use stays firmly under the allocated limit.

image

chart (4)

UPDATE: I tried it again with a larger CPU allocation and it finished correctly, I will test the rest of the pipeline now

MatthewTCManion commented 1 month ago

I had a similar exit on eGRN_extended, but it ran fine when I used the command outside of the Snakemake pipeline:


> scenicplus grn_inference eGRN \
    --is_extended \
    --TF_to_gene_adj_fname tf_to_gene_adj.tsv \
    --region_to_gene_adj_fname region_to_gene_adj.tsv \
    --cistromes_fname cistromes_extended.h5ad \
    --ranking_db_fname /data/PetrosLab/Matt/scenicplus/Nkx_750bp.regions_vs_motifs.rankings.feather \
    --eRegulon_out_fname eRegulons_extended.tsv \
    --temp_dir /data/PetrosLab/Matt/scenicplus/consensus_peak_bulk_750bp/tmp/ \
    --order_regions_to_genes_by importance \
    --order_TFs_to_genes_by importance \
    --gsea_n_perm 1000 \
    --quantiles 0.85 0.90 0.95 \
    --top_n_regionTogenes_per_gene 5 10 15 \
    --top_n_regionTogenes_per_region \
    --min_regions_per_gene 0 \
    --rho_threshold 0.05 \
    --min_target_genes 10 \

I'm not sure what the issue is, but it appears to be with the snakemake workflow, not the specific steps

UPDATE: the snakemake workflow worked for AUCell_extended after running the previous 2 steps manually, but failed on eGRN_direct:

scenicplus grn_inference eGRN \ --TF_to_gene_adj_fname tf_to_gene_adj.tsv \ --region_to_gene_adj_fname region_to_gene_adj.tsv \ --cistromes_fname cistromes_direct.h5ad \ --ranking_db_fname /data/PetrosLab/Matt/scenicplus/Nkx_750bp.regions_vs_motifs.rankings.feather \ --eRegulon_out_fname eRegulon_direct.tsv \ --temp_dir /data/PetrosLab/Matt/scenicplus/consensus_peak_bulk_750bp/tmp/ \ --order_regions_to_genes_by importance \ --order_TFs_to_genes_by importance \ --gsea_n_perm 1000 \ --quantiles 0.85 0.90 0.95 \ --top_n_regionTogenes_per_gene 5 10 15 \ --top_n_regionTogenes_per_region \ --min_regions_per_gene 0 \ --rho_threshold 0.05 \ --min_target_genes 10 \ --n_cpu 20

MatthewTCManion commented 1 month ago

All steps ran correctly when I set the number of CPUs to 50, it seems it was all just a resource allocation issue.