ari-apc-lab / croupier

Cloudify plugin for HPCs and batch applications
https://hub.docker.com/repository/docker/marangiop/cloudify-croupier-ari-apc-lab
Apache License 2.0
6 stars 4 forks source link

Automatically delete HPC reservation when workflow is over #9

Open marangiop opened 3 years ago

marangiop commented 3 years ago

This suggestion is simple. In GRAPEVINE I am working a lot with reservations and I even added it as a new input inside the inputs.yaml file

http://raw.githubusercontent.com/ari-apc-lab/croupier/grapevine/croupier_plugin/tests/integration/blueprints/inputs_def.yaml

The ideal thing is that whenever all the jobs of a given workflow have been executed, then the orchestrator would be able to automatically delete the reservation that has been specifically created for the workflow. This is mostly advantageous for the HPC center, because in this way we minimize the waste of computational resources.

The naive solution I came up with is just creating an extra script at the end of every blueprint (here with a boostrap approach) that executes a file that contains the exact instructions for deleting the specific reservation associated with the workflow.

    output_transfer_reservation_delete_00_greece:
        type: croupier.nodes.Job
        properties:
            job_options:
                script: "reservation_delete_script.script"
                nodes: 1
                tasks: 1
                tasks_per_node: 1
                max_time: '00:05:00'
                partition: cola-corta
                queue: batch
            deployment:
                bootstrap: "scripts/create_script_delete_reservation.sh"
            skip_cleanup: True
        relationships:
            - type: job_managed_by_interface
              target: cesga_hpc
            - type: job_depends_on
              target: postfive_00_greece

Maybe there is a better alternative to my solution. Clearly my solution does not work in the case where there is an error in the execution of one the jobs upon which the reservation_delet script depends. As we know, Cloudify/Croupier automatically cancels all running jobs and stop executed of run_jobs when any job fails. In that case, the final job for deleting the reservation would not be run and the reservation woud stay active.

This issue can be closed, or kept open until it's properly addressed by a proper data management solution.