Running with multiple datasets in parallel some processes are skiped

MrHurricanee commented 8 months ago

I got unwanted behavior when running with multiple datasets. If one of the datasets is smaller, and thus runs faster, it finishes the make_ref_ranking_dataframe process and make_log_file before another dataset is ready for these steps, and these steps get marked as done. These steps and their succeeding steps are then skipped for the other datasets. (see below for the intermediate output of the pipeline) Rerunning the pipeline (multiple times) with the -resume option does complete the outputs one by one, but sometimes does nothing.

Intermediate output MINI-EX:

N E X T F L O W  ~  version 23.10.0
Launching `[miniex.nf](http://miniex.nf/)` [goofy_hilbert] DSL2 - revision: 05a0ca7fa0
Motif-Informed Network Inference from gene EXpression v2.2
===========================================================
Running TF motif enrichment filtering on TF_motifs
Running single-cell cluster enrichment using the top 700 upregulated genes per cluster
Filtering out regulons of single-cell clusters where the TF is expressed in less than 10 % of the cells
Plotting expression specificity and DE calls for the top 150 regulons

executor >  local (60)
[00/7373df] process > check_user_input (1)           [100%] 1 of 1 ✔
[55/0f30a9] process > get_expressed_genes (3)        [100%] 5 of 5 ✔
[04/4ad1e7] process > unzip_motif_mappings           [100%] 1 of 1 ✔
[d4/744308] process > run_enricher_motifs (5)        [100%] 5 of 5 ✔
[a9/58089a] process > filter_motifs (5)              [ 80%] 4 of 5
[aa/41602b] process > get_top_degs (1)               [100%] 5 of 5 ✔
[5d/09060e] process > run_enricher_cluster (4)       [100%] 4 of 4
[92/507679] process > filter_expression (4)          [100%] 4 of 4
[a0/3758e5] process > make_info_file (4)             [100%] 4 of 4
[0d/e84a82] process > make_regulon_clustermap (4)    [100%] 4 of 4
[1e/352c80] process > get_network_centrality (4)     [100%] 4 of 4
[47/964b1f] process > make_go_enrichment_files (4)   [100%] 4 of 4
[bb/077588] process > run_enricher_go (4)            [100%] 4 of 4
[72/9bdb0a] process > check_reference (4)            [100%] 4 of 4
[4e/727600] process > make_ref_ranking_dataframe (1) [100%] 1 of 1 ✔
[5a/3b2984] process > make_borda (1)                 [100%] 1 of 1
[35/97ba6a] process > score_edges (1)                [100%] 1 of 1
[55/c9d70e] process > make_top_regulons_heatmaps (1) [100%] 1 of 1
[61/d3f1b3] process > make_regmaps (1)               [100%] 1 of 1
[ff/076eb1] process > make_log_file (1)              [100%] 1 of 1 ✔

jstaut commented 8 months ago

Hello,

We implemented a potential fix on the bug_parallel_running branch. Could you check if this solves the problem you were having when running MINI-EX on multiple datasets in parallel?

Kind regards, Jasper

MrHurricanee commented 8 months ago

Thanks for the quick response! On that branch it now runs the complete pipeline for every dataset. The only thing still missing is the log file(s) of all the datasets, it is only including/creating the first dataset that is finished.

command line output:

(MINI-EXv2) user$ nextflow -C miniex.config run miniex.nf
N E X T F L O W  ~  version 23.10.0
Launching `miniex.nf` [pedantic_cray] DSL2 - revision: 4c94da00a4
Motif-Informed Network Inference from gene EXpression v2.2
===========================================================
Running TF motif enrichment filtering on TF_motifs
Running single-cell cluster enrichment using the top 700 upregulated genes per cluster
Filtering out regulons of single-cell clusters where the TF is expressed in less than 10 % of the cells
Plotting expression specificity and DE calls for the top 150 regulons

executor >  local (88)
[0d/71b1e1] process > check_user_input (1)           [100%] 1 of 1 ✔
[62/b7518c] process > get_expressed_genes (5)        [100%] 5 of 5 ✔
[36/8a3060] process > unzip_motif_mappings           [100%] 1 of 1 ✔
[8b/835f3b] process > run_enricher_motifs (5)        [100%] 5 of 5 ✔
[ab/352608] process > filter_motifs (5)              [100%] 5 of 5 ✔
[e7/923bb4] process > get_top_degs (1)               [100%] 5 of 5 ✔
[c8/e9409e] process > run_enricher_cluster (5)       [100%] 5 of 5 ✔
[27/482a43] process > filter_expression (5)          [100%] 5 of 5 ✔
[2b/1728f0] process > make_info_file (5)             [100%] 5 of 5 ✔
[46/c61f7b] process > make_regulon_clustermap (5)    [100%] 5 of 5 ✔
[d7/a0181c] process > get_network_centrality (5)     [100%] 5 of 5 ✔
[b6/a33f75] process > make_go_enrichment_files (5)   [100%] 5 of 5 ✔
[e8/4dc2d6] process > run_enricher_go (5)            [100%] 5 of 5 ✔
[e4/1904d0] process > check_reference (5)            [100%] 5 of 5 ✔
[ef/1f8dc8] process > make_ref_ranking_dataframe (5) [100%] 5 of 5 ✔
[94/ed7ec8] process > make_borda (5)                 [100%] 5 of 5 ✔
[18/ae2c86] process > score_edges (5)                [100%] 5 of 5 ✔
[d9/96981b] process > make_top_regulons_heatmaps (5) [100%] 5 of 5 ✔
[df/1c4fa6] process > make_regmaps (5)               [100%] 5 of 5 ✔
[0c/2815d4] process > make_log_file (1)              [100%] 1 of 1 ✔
Done!
Completed at: 11-Jan-2024 10:04:08
Duration    : 8m 58s
CPU hours   : 0.6
Succeeded   : 88

jstaut commented 8 months ago

Thanks for testing and letting us know. This should be fixed now. Could you do a pull and try again?

MrHurricanee commented 8 months ago

It works now!

(MINI-EXv2) user$ nextflow -C miniex.config run miniex.nf 
N E X T F L O W  ~  version 23.10.0
Launching `miniex.nf` [curious_monod] DSL2 - revision: 0f21b31189
Motif-Informed Network Inference from gene EXpression v2.2
===========================================================
Running TF motif enrichment filtering on TF_motifs
Running single-cell cluster enrichment using the top 700 upregulated genes per cluster
Filtering out regulons of single-cell clusters where the TF is expressed in less than 10 % of the cells
Plotting expression specificity and DE calls for the top 150 regulons

executor >  local (92)
[ae/2b8178] process > check_user_input (1)           [100%] 1 of 1 ✔
[2e/fcbca9] process > get_expressed_genes (4)        [100%] 5 of 5 ✔
[fc/11984d] process > unzip_motif_mappings           [100%] 1 of 1 ✔
[19/4cfb48] process > run_enricher_motifs (1)        [100%] 5 of 5 ✔
[3f/22bfb3] process > filter_motifs (5)              [100%] 5 of 5 ✔
[17/2a7217] process > get_top_degs (3)               [100%] 5 of 5 ✔
[39/a7168d] process > run_enricher_cluster (5)       [100%] 5 of 5 ✔
[1d/f560be] process > filter_expression (5)          [100%] 5 of 5 ✔
[43/5e40aa] process > make_info_file (5)             [100%] 5 of 5 ✔
[73/112497] process > make_regulon_clustermap (5)    [100%] 5 of 5 ✔
[99/e21472] process > get_network_centrality (5)     [100%] 5 of 5 ✔
[33/eb1454] process > make_go_enrichment_files (4)   [100%] 5 of 5 ✔
[fe/75acc0] process > run_enricher_go (5)            [100%] 5 of 5 ✔
[a6/488444] process > check_reference (5)            [100%] 5 of 5 ✔
[1f/45f5a3] process > make_ref_ranking_dataframe (5) [100%] 5 of 5 ✔
[a4/464951] process > make_borda (5)                 [100%] 5 of 5 ✔
[08/767cf3] process > score_edges (5)                [100%] 5 of 5 ✔
[d1/326d4e] process > make_top_regulons_heatmaps (4) [100%] 5 of 5 ✔
[70/df3f96] process > make_regmaps (5)               [100%] 5 of 5 ✔
[b9/9db0dd] process > make_log_file (5)              [100%] 5 of 5 ✔
Done!
Completed at: 11-Jan-2024 11:28:36
Duration    : 8m 50s
CPU hours   : 0.6
Succeeded   : 92

VIB-PSB / MINI-EX

Running with multiple datasets in parallel some processes are skiped #16