WIPACrepo / iceprod

IceCube dataset management system
MIT License
4 stars 4 forks source link

supercomputer mode #88

Closed dsschult closed 5 years ago

dsschult commented 8 years ago

(imported from trac: #1370)

Supercomputers are on the list of we-should-support-this. They are all the rage in HEP now, but have one main problem: nodes can't talk via http, gridftp, etc.

Solution:

A special plugin which can pre-download everything, write to a shared disk inbox, wait for the job to finish, and transfer output from the outbox.

Optional:

dsschult commented 8 years ago

Note that at SCAP 2016, it was advised that firewalls are going away for new supercomputers. If so, this may not be needed.

dsschult commented 5 years ago

And yet SCAP 2018 advised us to make more use of supercomputers, and in 2019 there are now several systems with firewalls that we otherwise have access to. The "trend" of removing firewalls has not materialized as hoped.

dsschult commented 5 years ago

Plan for a normal firewalled cluster:

  1. make a fake pilot for the task
  2. get task for processing (use the cluster's queue limits for resource size)
    • download input file(s) to local (network) disk
    • submit to local queue as manual job
  3. monitor queue. when job complete:
    • upload logs
    • if failed, mark task reset
    • if success
      • upload output file(s) from local (network) disk
      • mark task complete
    • remove fake pilot
dsschult commented 5 years ago

Graham is now successfully completing tasks. :tada:

We'll call this issue done, and open new issues for other supercomputers.