aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
175 stars 28 forks source link

Failing to integrate scATAC + scRNA cells (non multiome data). Error in rule prepare_GEX_ACC_non_multiome: #433

Open twinklevhatkar opened 1 month ago

twinklevhatkar commented 1 month ago

Hi! I am fairly new to the Scenic plus tool. I have been able to successfully install and get the snakemake workflow running. I have been working with non multiome data. My workflow runs just fine until it reaches "prepare_GEX_ACC_non_multiome:" module. Because it is non multiomics data, I know the cell barcodes will not match. I have the "GEX:annotation" metadata present for both my assays which helps integrating the two assays.

Here's the log file with my error:

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 2 Rules claiming more threads will be scaled down. Job stats: job count


AUCell_direct 1 AUCell_extended 1 all 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 prepare_GEX_ACC_non_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 11

Select jobs to execute... Execute 1 jobs...

[Mon Jul 15 00:20:13 2024] localrule prepare_GEX_ACC_non_multiome: input: /home/scenicplus/outs/cistopic_obj_final.pkl, /home/scenicplus/outs/integrated_fibro_harmony_final.h5ad output: /home/scenicplus/outs/ACC_GEX.h5mu jobid: 2 reason: Missing output files: /home/scenicplus/outs/ACC_GEX.h5mu resources: tmpdir=/tmp

[Mon Jul 15 00:20:14 2024] Error in rule prepare_GEX_ACC_non_multiome: jobid: 2 input: /home/scenicplus/outs/cistopic_obj_final.pkl, /home/scenicplus/outs/integrated_fibro_harmony_final.h5ad output: /home/scenicplus/outs/ACC_GEX.h5mu shell:

        scenicplus prepare_data prepare_GEX_ACC                 --cisTopic_obj_fname /home/scenicplus/outs/cistopic_obj_final.pkl                 --GEX_anndata_fname /home/scenicplus/outs/integrated_fibro_harmony_final.h5ad                 --out_file /home/scenicplus/outs/ACC_GEX.h5mu                 --bc_transform_func                  --is_not_multiome                 --key_to_group_by GEX:annotation                 --nr_cells_per_metacells 10

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-07-15T002013.095171.snakemake.log WorkflowError: At least one job did not complete successfully.

######## My log file doesn't indicate where the issue is. Any help would be appreciated. I am not sure what I'm missing here?!

Thanks!

2024-07-15T002013.095171.snakemake.log

SeppeDeWinter commented 1 month ago

Hi @twinklevhatkar

Could you run

scenicplus prepare_data prepare_GEX_ACC \
                 --cisTopic_obj_fname /home/scenicplus/outs/cistopic_obj_final.pkl \
                 --GEX_anndata_fname /home/scenicplus/outs/integrated_fibro_harmony_final.h5ad \
                 --out_file /home/scenicplus/outs/ACC_GEX.h5mu \
                 --bc_transform_func \
                  --is_not_multiome \
                 --key_to_group_by GEX:annotation \
                 --nr_cells_per_metacells 10

This might reveal the underlying error.

Best,

Seppe