cooperative-computing-lab / makeflow-examples

Example workflows for the Makeflow workflow system.
32 stars 18 forks source link

Q: way to recover from new makefile (increased memory request) with existing files #36

Closed stemangiola closed 3 years ago

stemangiola commented 4 years ago

Hello,

I remember you guys added a way to do it but I can't find it online.

I updated my makefile with higher memory request, and I would like start makeflow with my new makefile, and from the same existinging dependencies/files (without starting from the beginning).

However, it seems that the run peaks up from logfile ignoring my new makefile. I am afraid if I delete the logfile all produced files will be deleted when I start fresh.

Thanks.

btovar commented 4 years ago

@stemangiola, I ran a local test modifying resources between runs and makeflow correctly picks up the new resource definitions. Could you try the following example to pinpoint your issue:

# file: small.makeflow

.MAKEFLOW CATEGORY first
.MAKEFLOW DISK 200
out_a:
    dd bs=1024 count=100000 if=/dev/zero of=out_a

.MAKEFLOW CATEGORY second
.MAKEFLOW DISK 1
out_b: out_a
    dd bs=1024 count=100000 if=out_a of=out_b
$ makeflow --monitor=mon -dall  small.makeflow

The first rule should succeed, and second should fail. Rerunning the makeflow changing .MAKEFLOW DISK 1 to .MAKEFLOW DISK 1000 should only retry the second rule. In the debug messages you should see something like:

small_test has 2 rules.
recovering from log file small_test.makeflowlog...
checking for old running or failed jobs...
will retry failed rule: dd bs=1024 count=100000 if=out_a of=out_b
2020/08/26 10:05:27.15 makeflow[12392] makeflow: export DISK=1000
2020/08/26 10:05:27.15 makeflow[12392] makeflow: export CATEGORY=second
2020/08/26 10:05:27.15 makeflow[12392] makeflow: node 1 failed -> waiting
stemangiola commented 4 years ago
2020/08/27 21:08:40.04 makeflow[20289] makeflow: /home/mangiola.s/unix3XX/third_party_sofware/cctools-7.1.5-x86_64-centos6/bin/makeflow version 7.1.5 FINAL (released 2020-05-04 14:33:58 +0000)
2020/08/27 21:08:40.13 makeflow[20289] makeflow: Built by root@ea03b42b9e30 on 2020-05-04 14:33:58 +0000
2020/08/27 21:08:40.13 makeflow[20289] makeflow: System: Linux ea03b42b9e30 4.15.0-1028-gcp #29~16.04.1-Ubuntu SMP Tue Feb 12 16:31:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
2020/08/27 21:08:40.13 makeflow[20289] makeflow: Configuration: --debug --strict --with-cvmfs-path /opt/vc3/cctools-deps/cvmfs --with-ext2fs-path /opt/vc3/cctools-deps/ext2fs --with-fuse-path /opt/vc3/cctools-deps/fuse --with-musl-path /opt/vc3/cctools-deps/musl --with-musl-zlib-path /opt/vc3/cctools-deps/musl-zlib --with-mysql-path /opt/vc3/cctools-deps/mysql --with-uuid-path /opt/vc3/cctools-deps/uuid --with-xrootd-path /opt/vc3/cctools-deps/xrootd --prefix /tmp/cctools-7.1.5-x86_64-centos6
2020/08/27 21:08:40.13 makeflow[20289] auth: kerberos: not compiled in
2020/08/27 21:08:40.13 makeflow[20289] auth: globus: not compiled in
2020/08/27 21:08:40.13 makeflow[20289] auth: unix: registered
2020/08/27 21:08:40.13 makeflow[20289] auth: ticket: registered
2020/08/27 21:08:40.13 makeflow[20289] auth: hostname: registered
2020/08/27 21:08:40.13 makeflow[20289] auth: address: registered
2020/08/27 21:08:40.13 makeflow[20289] makeflow_hook: Hook Fail Dir:trying to register
2020/08/27 21:08:40.13 makeflow[20289] makeflow_hook: Hook Fail Dir:registered
2020/08/27 21:08:40.13 makeflow[20289] makeflow_hook: hook Resource Monitor:initializing
2020/08/27 21:08:40.13 makeflow[20289] rmonitor: locating resource monitor executable...
2020/08/27 21:08:40.13 makeflow[20289] rmonitor: trying executable at local directory.
2020/08/27 21:08:40.13 makeflow[20289] rmonitor: trying executable at PATH.
2020/08/27 21:08:40.14 makeflow[20289] rmonitor: trying executable at installed path location.
2020/08/27 21:08:40.14 makeflow[20289] error: Monitor mode was enabled, but could not find resource_monitor in PATH.
2020/08/27 21:08:40.14 makeflow[20289] error: hook Resource Monitor:create returned 1

However I am not sure I follow you answer. What I mean is that all runs are succeding for me.

The only thing I am trying to to

1) execute a makeflow such as

MEMORY=20X
...

2) Create the file dependency

3) delete such file

4) change the makefile

MEMORY=40X

5) run makeflow with the new makefile

But the resource requested is still 20X

btovar commented 4 years ago

The log does not have resources information, so I wonder from where the previous MEMORY definition is being picked up. Could you send me your Makefile at btovar@nd.edu?

stemangiola commented 4 years ago

done!

Then I'm not sure why

I am asking now 60K Mb of RAM, and keep asking 10K and 30K like the past makeflow. I'm using cctools-7.1.5

btovar commented 4 years ago

One thing I see is that you have defined CATEGORY=lv4 twice. First with MEMORY=6... and then with MEMORY=1... Jobs will be submitted with the most recent definition (from top to bottom). Could you try changing the second lv4 to lv5 and see if that solves the issue?

stemangiola commented 4 years ago

Oooops.

My bad. I think would be really good to print a big warning if two categories are named the same. I'm not sure maybe I don't see the applicability of many categories with the same name.

At least category with same name but different resources should lead to error, as this is always unwanted.

btovar commented 4 years ago

Indeed. I opened: https://github.com/cooperative-computing-lab/cctools/issues/2373

Thanks for the suggestion!

btovar commented 3 years ago

Fixed with https://github.com/cooperative-computing-lab/cctools/pull/2448