geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
62 stars 23 forks source link

submit_jobs.bash: Is limiting the total number os tasks working? #463

Closed falkamelung closed 3 years ago

falkamelung commented 3 years ago

I am not sure it works in all cases. In this example it submit run_*igram_8 and run_*igram_9 to go to 282 and 305 tasks, respectively, although 250 is the limit? After that it waits for 5 minutes as it should. A few more tasks is not a problem but it would be nice to understand this.

-------------------------------------------------------------------------------------------------------------------------
| File Name            | Additional Tasks | Step Active Tasks | Total Active Tasks | Active Jobs | Message              |  
-------------------------------------------------------------------------------------------------------------------------
| run_08_gen...igram_0 | 23               | 0/250             | 98/250             | 3/25        | Submitted: 7569691   |
| run_08_gen...igram_1 | 23               | 23/250            | 121/250            | 4/25        | Submitted: 7569693   |
| run_08_gen...igram_2 | 23               | 46/250            | 144/250            | 5/25        | Submitted: 7569694   |
| run_08_gen...igram_3 | 23               | 69/250            | 167/250            | 6/25        | Submitted: 7569695   |
| run_08_gen...igram_4 | 23               | 92/250            | 190/250            | 7/25        | Submitted: 7569696   |
| run_08_gen...igram_5 | 23               | 115/250           | 213/250            | 8/25        | Submitted: 7569697   |
| run_08_gen...igram_6 | 23               | 138/250           | 236/250            | 9/25        | Submitted: 7569698   |
| run_08_gen...igram_7 | 23               | 161/250           | 259/250            | 10/25       | Submitted: 7569699   |
| run_08_gen...igram_8 | 23               | 184/250           | 282/250            | 11/25       | Submitted: 7569700   |
| run_08_gen...igram_9 | 23               | 207/250           | 305/250            | 12/25       | Submitted: 7569701   |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |
| run_08_gen...gram_10 | 23               | 161/250           | 222/250            | 9/25        | Submitted: 7569914   |
| run_08_gen...gram_11 | 23               | 184/250           | 245/250            | 10/25       | Submitted: 7569915   |
| run_08_gen...gram_12 | 23               | 184/250           | 268/250            | 11/25       | Submitted: 7569916   |
| run_08_gen...gram_13 | 23               | 207/250           | 268/250            | 12/25       | Submitted: 7569917   |
| run_08_gen...gram_14 | 23               | 230/250           | 291/250            | 13/25       | Wait 5 min           |

Here another example. It should not have submitted run_09_mer...gram_11 and run_09_mer...gram_12 as it is over 250 tasks.

/scratch/05861/tg851601/BalotschistanSenAT13/run_files/run_09_merge_burst_igram_27.job
-------------------------------------------------------------------------------------------------------------------------
| File Name            | Additional Tasks | Step Active Tasks | Total Active Tasks | Active Jobs | Message              |  
-------------------------------------------------------------------------------------------------------------------------
| run_09_mer...igram_0 | 20               | 0/250             | 84/250             | 3/25        | Submitted: 7569969   |
| run_09_mer...igram_1 | 20               | 20/250            | 104/250            | 4/25        | Submitted: 7569970   |
| run_09_mer...igram_2 | 20               | 40/250            | 124/250            | 5/25        | Submitted: 7569971   |
| run_09_mer...igram_3 | 20               | 60/250            | 144/250            | 6/25        | Submitted: 7569972   |
| run_09_mer...igram_4 | 20               | 80/250            | 141/250            | 7/25        | Submitted: 7569973   |
| run_09_mer...igram_5 | 20               | 100/250           | 161/250            | 8/25        | Submitted: 7569974   |
| run_09_mer...igram_6 | 20               | 120/250           | 181/250            | 9/25        | Submitted: 7569975   |
| run_09_mer...igram_7 | 20               | 140/250           | 201/250            | 10/25       | Submitted: 7569976   |
| run_09_mer...igram_8 | 20               | 160/250           | 221/250            | 11/25       | Submitted: 7569977   |
| run_09_mer...igram_9 | 20               | 180/250           | 241/250            | 12/25       | Submitted: 7569978   |
| run_09_mer...gram_10 | 20               | 200/250           | 261/250            | 13/25       | Submitted: 7569979   |
| run_09_mer...gram_11 | 20               | 220/250           | 281/250            | 14/25       | Submitted: 7569980   |
| run_09_mer...gram_12 | 20               | 240/250           | 301/250            | 15/25       | Wait 5 min           |
| run_09_mer...gram_12 | 20               | 40/250            | 101/250            | 5/25        | Submitted: 7569993   |
| run_09_mer...gram_13 | 20               | 60/250            | 121/250            | 6/25        | Submitted: 7569996   |
| run_09_mer...gram_14 | 20               | 80/250            | 141/250            | 7/25        | Submitted: 7569998   |
| run_09_mer...gram_15 | 20               | 100/250           | 161/250            | 8/25        | Submitted: 7570000   |
| run_09_mer...gram_16 | 20               | 120/250           | 181/250            | 8/25        | Submitted: 7570003   |
| run_09_mer...gram_17 | 20               | 140/250           | 201/250            | 9/25        | Submitted: 7570004   |
| run_09_mer...gram_18 | 20               | 160/250           | 221/250            | 10/25       | Submitted: 7570005   |
| run_09_mer...gram_19 | 20               | 180/250           | 241/250            | 11/25       | Submitted: 7570006   |
| run_09_mer...gram_20 | 20               | 200/250           | 261/250            | 12/25       | Submitted: 7570007   |
| run_09_mer...gram_21 | 20               | 220/250           | 281/250            | 13/25       | Submitted: 7570008   |
| run_09_mer...gram_22 | 20               | 240/250           | 301/250            | 14/25       | Wait 5 min           |
| run_09_mer...gram_22 | 20               | 0/250             | 61/250             | 3/25        | Submitted: 7570038   |
| run_09_mer...gram_23 | 20               | 20/250            | 81/250             | 4/25        | Submitted: 7570039   |
| run_09_mer...gram_24 | 20               | 40/250            | 101/250            | 5/25        | Submitted: 7570041   |
| run_09_mer...gram_25 | 20               | 60/250            | 121/250            | 6/25        | Submitted: 7570043   |
| run_09_mer...gram_26 | 20               | 80/250            | 141/250            | 7/25        | Submitted: 7570045   |
| run_09_mer...gram_27 | 6                | 100/250           | 161/250            | 8/25        | Submitted: 7570046   |
-------------------------------------------------------------------------------------------------------------------------
Jobs submitted: 7569969 7569970 7569971 7569972 7569973 7569974 7569975 7569976 7569977 7569978 7569979 7569980 7569993 7569996 7569998 7570000 7570003 7570004 7570005 7570006 7570007 7570008 7570038 7570039 7
570041 7570043 7570045 7570046
run_09_merge_burst_igram: 28 jobs; 22 COMPLETED, 0 RUNNING , 6 PENDING .

This example, in contrast, works well. It waits when it should.

------------------------------------------------------------------------------------------------------------------------
| File Name            | Additional Tasks | Step Active Tasks | Total Active Tasks | Active Jobs | Message              |  
-------------------------------------------------------------------------------------------------------------------------
| run_10_fil...rence_0 | 46               | 0/250             | 61/250             | 3/25        | Submitted: 7570167   |
| run_10_fil...rence_1 | 46               | 46/250            | 107/250            | 4/25        | Submitted: 7570169   |
| run_10_fil...rence_2 | 46               | 92/250            | 153/250            | 5/25        | Submitted: 7570171   |
| run_10_fil...rence_3 | 46               | 138/250           | 199/250            | 6/25        | Submitted: 7570172   |
| run_10_fil...rence_4 | 46               | 184/250           | 245/250            | 7/25        | Submitted: 7570174   |
| run_10_fil...rence_5 | 46               | 230/250           | 291/250            | 8/25        | Wait 5 min           |
| run_10_fil...rence_5 | 46               | 230/250           | 291/250            | 8/25        | Wait 5 min           |
| run_10_fil...rence_5 | 46               | 230/250           | 291/250            | 8/25        | Wait 5 min           |
falkamelung commented 3 years ago

This does not seem to be a problem - I never got >300 tasks, but I would be interested to know whether this is supposed to be happening.

Ovec8hkin commented 3 years ago

This is a strange bug. What dataset are you using? I will look into it.