ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Negative complete jobs in qstat output #111

Open s-andrews opened 7 years ago

s-andrews commented 7 years ago
======================================================================
 Cluster Flow Pipeline: samtools_sort_index
 Submitted:             20 minutes, 5 seconds ago
 Working Directory:     /bi/group/bioinf/Rachael_Huntly/Cufflinks_Analysis/Rachel_0_vs_8_hour
 Cluster Flow ID:       samtools_sort_index_1485260139
 Submitted Jobs:        17
 Running Jobs:          8
 Queued Jobs:           11 (resources)
 Completed Jobs:        -2 (-11%)
======================================================================

 - samtools_sort_index                             [4 cores]
      - email_run_complete
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete
           - email_pipeline_complete

 - samtools_sort_index                             [4 cores]
      - email_run_complete
      - email_run_complete
ewels commented 7 years ago

Is this always the case? Or only occasionally?

The code that does this parses how many jobs were submitted from the initial log file, then subtracts the number of running / pending jobs etc. I guess I could easily add a check that this number is ≥ 0 (and make it 0 if not), but it would be better to figure out why it's able to get a negative number..

Phil

ewels commented 7 years ago

@s-andrews / @FelixKrueger - if one of you could send me the CF submission log for a run where this is happened I'll take a look. I think it must be a case that the number of jobs submitted aren't being counted properly.

FelixKrueger commented 7 years ago

here is one, cheers. cf_bismark_singlecell_1488545447_submissionlog.txt

ewels commented 7 years ago

submission log:

Cluster Flow Pipeline: bismark_singlecell
Submitted:             7 minutes, 2 seconds ago
Working Directory:     /path/to/dir
Cluster Flow ID:       bismark_singlecell_1488545447
Submitted Jobs:        902
Running Jobs:          75
Queued Jobs:           1102 (resources)
Completed Jobs:        -275 (-30%)

Hmm, strange. I agree that it looks like there were 902 jobs submitted there. So it must be over-counting the queued jobs somehow.

Ok, next up - could you do a cf --qstat to get the above log followed by a normal qstat so that I can try to figure out why it thinks that there are so many pipeline jobs queued please..

ewels commented 7 years ago

Also - I didn't actually explicitly say this myself, but it works fine for me 😁 That's why I'm asking you guys to do stuff.

Two more questions:

  1. Does it always do this, or only sometimes?
  2. Why are you running v0.4_dev? v0.4 is the latest released version and v0.5_dev is the most recent development version 😉

Phil

FelixKrueger commented 7 years ago

Are you sure you want this? ^^ CF_qstat.txt qstat.txt

ewels commented 7 years ago

Ah, no good - everything is fine in CF_qstat.txt, looks like the correct number of running and queued jobs, no negative Completed Jobs number..

ewels commented 7 years ago

..spoke to soon, there are a lot of different pipeline runs in this file it would seem...!!!

FelixKrueger commented 7 years ago

I see this:

 Cluster Flow ID:       bismark_singlecell_1488545447
 Submitted Jobs:        902
 Running Jobs:          77
 Queued Jobs:           1095 (resources)
 Completed Jobs:        -270 (-29%)
======================================================================

 - bismark_align                                   [4 cores]  [queued, priority 0]
      - bismark_deduplicate
           - bismark_methXtract
                - bismark_report
FelixKrueger commented 7 years ago

yes sorry, it's not like I have nothing to do... :)

ewels commented 7 years ago

Ah, I need longer qstat output though. The default trims the full job name, I forgot that. Can you instead do qstat -pri -r -xml please?

FelixKrueger commented 7 years ago

Here you go: qstat.txt

ewels commented 7 years ago

Yay, 75114 lines of xml for me to read through. Such a lucky boy! 🥇