CABLE-LSM / benchcab

Tool for evaluation of CABLE land surface model
https://benchcab.readthedocs.io/en/latest/
Apache License 2.0
2 stars 3 forks source link

Automation of meorg analysis #289

Open ccarouge opened 1 month ago

bschroeter commented 1 month ago

There are a few factors to consider here:

As it stands, the proposed workflow is as follows:

  1. Benchcab runs, emits output files, triggers a new PBS job (on the copyq) for the meorg_client to operate in.
  2. In this second job, meorg_client will upload the outputs to ME.org, which will put them in a queue on the server side (which moves from a temporary space onto the object store) and triggers another copyq job at a computed time interval based on the cron interval and filesize.
  3. (this process is scheduled every 5mins or so on the server side, plus a transfer latency of around 150mbit/sec (will confirm?) to actually move to the object store.
  4. meorg_client operates in a 3rd job to a) ensure the file has been successfully moved to the object store, and b) triggers the analysis. (or resubmitting itself in a few mins if the file is not ready)
  5. Analysis runs on ME.org

After step 5, I need to confirm the process with Gavin, as it is unclear if Me.org provides any notification of a successful/failed analysis run. If not, we have the option of either checking this status in PBS job 3 or spawning a 4th job to get the analysis status using the meorg_client and alert the user to failure or link to plots.

Please let me know if anything is unclear.

Cheers, Ben

SeanBryan51 commented 1 month ago

Is this a good time to think about https://github.com/CABLE-LSM/benchcab/issues/157? We have been thinking about introducing a framework for doing workflow management in benchcab for a while now.

ccarouge commented 1 month ago

Since the process is largely hidden from the user and can be relatively slow, it would be good to think if and when we would want to send information (emails? log file?) to the users.