Azure / batch-shipyard

Simplify HPC and Batch workloads on Azure
MIT License
277 stars 121 forks source link

Resubmitting New Task to same Job #271

Closed markpearl closed 5 years ago

markpearl commented 5 years ago

I currently have an issue with batch where the job task I'm running needs to loop for multiple files for the specific task, therefore it needs to resubmit the same job but with the input and output files for that rule being changed.

Currently I kill the job and get the process to sleep 40 seconds.

"$SHIPYARD/shipyard jobs del --configdir $FILESHARE/snakemake/azurebatch/index_genome -y && sleep 40"

However, there are some cases where the job doesn't get deleted in time before the next task is submitted.

Is there a better way around this rather than having the process sleep an arbitrary number of seconds?

alfpark commented 5 years ago

Note that you can specify task-level input and output and can add an unlimited number of tasks on an active job. This is the recommended way of accomplishing what you want as you do not have to wait for a job deletion. You can add multiple tasks per job in the jobs.yaml file.

Otherwise, you can drop using jobs del and instead use the -y --recreate flags on jobs add. Please see the usage documentation.

markpearl commented 5 years ago

Thanks for getting back to me. The input and output files are specified by the Snakemake job, so it's essentially trying to run 6 commands with 6 different sets of input and output files. I don't think I would be able to manually specify the input and outputs in the jobs.yaml as I want snakemake to drive this.

These commands one by one should get stored in the jobrun.sh which sits in the fileshare. I've tried to take out the task_id but I seem to be getting a schema validation error:

job_specifications:

This is the error I'm receiving:

[image: image.png]

On Tue, Mar 26, 2019 at 11:16 AM Fred Park notifications@github.com wrote:

Note that you can add task-level input and output and can add an unlimited number of tasks on an active job. This is the recommended way of accomplishing what you want as you do not have to wait for a job deletion. To enable this scenario, do not explicitly provide task ids.

Otherwise, you can drop using jobs del and instead use the -y --recreate flags on jobs add. Please see the usage documentation https://github.com/Azure/batch-shipyard/blob/master/docs/20-batch-shipyard-usage.md .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/271#issuecomment-476696745, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy_1qUCt-K4rJ3sZQ9d-kH8vBXWtUkks5vajnCgaJpZM4cHZLb .

markpearl commented 5 years ago

Ah I'm guessing I just need to set the task_id to null, correct?

On Tue, Mar 26, 2019 at 12:16 PM Mark Pearl markpearl7@gmail.com wrote:

Thanks for getting back to me. The input and output files are specified by the Snakemake job, so it's essentially trying to run 6 commands with 6 different sets of input and output files. I don't think I would be able to manually specify the input and outputs in the jobs.yaml as I want snakemake to drive this.

These commands one by one should get stored in the jobrun.sh which sits in the fileshare. I've tried to take out the task_id but I seem to be getting a schema validation error:

job_specifications:

  • id: agcanjobtrimmomatic tasks: docker_image: agcanregistry.azurecr.io/trimmomatic shared_data_volumes:
  • azurefilevol remove_container_after_exit: true command: "/agcanfileshare/snakemake/jobrun.sh"

This is the error I'm receiving:

[image: image.png]

On Tue, Mar 26, 2019 at 11:16 AM Fred Park notifications@github.com wrote:

Note that you can add task-level input and output and can add an unlimited number of tasks on an active job. This is the recommended way of accomplishing what you want as you do not have to wait for a job deletion. To enable this scenario, do not explicitly provide task ids.

Otherwise, you can drop using jobs del and instead use the -y --recreate flags on jobs add. Please see the usage documentation https://github.com/Azure/batch-shipyard/blob/master/docs/20-batch-shipyard-usage.md .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/271#issuecomment-476696745, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy_1qUCt-K4rJ3sZQ9d-kH8vBXWtUkks5vajnCgaJpZM4cHZLb .

alfpark commented 5 years ago

Correct, or omit that property altogether (id under each task for the tasks sequence). Note that if you have task dependencies, you can't do this. You'd need to manually specify your task and task ids within the job all at once.

markpearl commented 5 years ago

Thank you. Is it feasible to allocate the number of cores used for a given job task? It seems that based on the Snakemake execution that it's only using 2 provides cores:

[image: image.png]

Does the command specified in the jobs.yaml override the value for cores when it's submitted to the Batch job?

Would I need to look at the logs for a given task to see what the resource consumption looks like? Ideally just want to make sure it's maximizing the pool infrastructure. There is 4 dedicated and 3 low priority standard_D64s_v3 virtual machines for this pool and I don't feel as it's using the full resources of the pool.

Would really appreciate some guidance on this.

Regards,

Mark

alfpark commented 5 years ago

Two things:

  1. Batch has no insight into your application - the number of threads used or IO operations involved.
  2. Batch allocates tasks to whole nodes (or a set of nodes if using multi-instance tasks).

You will probably need to leverage the max_tasks_per_node setting on the pool and schedule your work appropriately to maximize the number of resources for each compute node in the pool - according to your process execution characteristics.

Btw, if you're attaching images to your emails, those don't show up here.

markpearl commented 5 years ago

Thanks for getting back to me.

Without these setting enabled, can you explain to me what would happen right now with the following configurations for the jobs.yaml and pool.yaml?

job_specifications:

pool_specification: id: agcanpooltrimmomatic vm_configuration: platform_image: offer: UbuntuServer publisher: Canonical sku: 18.04-LTS vm_count: dedicated: 4 low_priority: 3 vm_size: Standard_D64s_v3 reboot_on_start_task_failed: false block_until_all_global_resources_loaded: true ssh: username: docker

Would like to understand how resource consumption works for the job task.

On Tue, Mar 26, 2019 at 2:28 PM Fred Park notifications@github.com wrote:

Two things:

  1. Batch has no insight into your application - the number of threads used or IO operations involved.
  2. Batch allocates tasks to whole nodes (or a set of nodes if using multi-instance tasks).

You will probably need to leverage the max_tasks_per_node setting on the pool and schedule your work appropriately to maximize the number of resources for each compute node in the pool - according to your process execution characteristics.

Btw, if you're attaching images to your emails, those don't show up here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/271#issuecomment-476787417, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy__lTypaEXcV4C6UzqxDXnDkqUvZPks5vambWgaJpZM4cHZLb .

alfpark commented 5 years ago

You have 1 task, so it will get scheduled to 1 node in your pool.

You have to modify your job such that your task is broken up into multiple independent portions that can be run concurrently. The end result is multiple tasks under tasks or a task_factory (if you need to auto-parameterize your task executions). As stated above, Batch can't help you in that regard as that logic is application dependent.

Your email formatting is difficult to read as YAML, please respond in GitHub with proper formatting.

markpearl commented 5 years ago

Okay, based on my current setup for a given pool, I have 4 dedicated nodes and 3 low priority nodes which are Standard_D64s_v3. This is an MPI application so I would want to take full advantage of the pool rather than just only relying on 1 node.

If configured the jobs.yaml to use multi-instance tasks like the following two options would this help?

job_specifications:
id: agcanjobtrimmomatic
  tasks:
  id: null
    docker_image: agcanregistry.azurecr.io/trimmomatic
    multi_instance:
      num_instances: pool_current_dedicated
    shared_data_volumes:
    - azurefilevol
    remove_container_after_exit: true
    command: "/agcanfileshare/snakemake/jobrun.sh"
job_specifications:
id: agcanjobtrimmomatic
  tasks:
  id: null
    docker_image: agcanregistry.azurecr.io/trimmomatic
    multi_instance:
      coordination_command: "/agcanfileshare/snakemake/jobrun.sh"
      num_instances: pool_current_dedicated
    remove_container_after_exit: true

Also if I took advantage of max_tasks_per_node, since I have 4 dedicated nodes and 64 cores per vm, I would set this to 256 in the pool.yaml, correct?

markpearl commented 5 years ago

Distributing this command across multiple nodes is critical otherwise the one node would be extremely bottle necked.

alfpark commented 5 years ago

Batch (or Batch Shipyard) cannot parallelize your workflow the way it is currently.

Simply converting the task to multi-instance will not implicitly convert the jobrun.sh script to run on multiple nodes (as it's presumably not coordinating multiple nodes to run a task like an MPI program). The application logic that is wrapped in jobrun.sh needs to be decomposed and parallelized as discrete tasks. There are helpers like task factories that might be applicable, but otherwise this is something that needs to be performed on a per-application basis.

Perhaps looking at this recipe would be helpful: https://github.com/Azure/batch-shipyard/tree/master/recipes/BLAST-CPU

markpearl commented 5 years ago

Thank you for the clarification. It seems snamemake can successfully add multiple tasks per node. Would there be any way I could push different tasks to specific nodes?

Essentially hows it broken up is per sample. So there's one task per sample.

Is there any way we could specific tasks to specific nodes or is that all handled at runtime?

On Tue, Mar 26, 2019, 16:19 Fred Park, notifications@github.com wrote:

Batch (nor Batch Shipyard) can parallelize your workflow the way it is currently.

Simply converting the task to multi-instance will not implicitly convert the jobrun.sh script to run on multiple nodes (as it's presumably not coordinating multiple nodes to run a task like an MPI program). The application logic that is wrapped in jobrun.sh needs to be decomposed and parallelized as discrete tasks. There are helpers like task factories https://github.com/Azure/batch-shipyard/blob/master/docs/35-batch-shipyard-task-factory-merge-task.md that might be applicable, but otherwise this is something that needs to be performed on a per-application basis.

Perhaps looking at this recipe would be helpful: https://github.com/Azure/batch-shipyard/tree/master/recipes/BLAST-CPU

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/271#issuecomment-476832323, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy_wOaPo9Vrun0rmSGav9ZvdY9wf3Sks5vaoDfgaJpZM4cHZLb .

markpearl commented 5 years ago

If I would define multiple tasks instead of just using one task in the jobs.yaml, how could I distribute those tasks across multiple compute nodes in a pool?

The only thing snakemake really does is defines the input and output files to execute against a given rule.

So I would have no problem hardcoding all of the commands and input/output in multiple tasks, if that's an option which could get me to parallelization across multiple nodes that would be great.

Let me know what you think!

On Tue, Mar 26, 2019 at 4:50 PM Mark Pearl markpearl7@gmail.com wrote:

Thank you for the clarification. It seems snamemake can successfully add multiple tasks per node. Would there be any way I could push different tasks to specific nodes?

Essentially hows it broken up is per sample. So there's one task per sample.

Is there any way we could specific tasks to specific nodes or is that all handled at runtime?

On Tue, Mar 26, 2019, 16:19 Fred Park, notifications@github.com wrote:

Batch (nor Batch Shipyard) can parallelize your workflow the way it is currently.

Simply converting the task to multi-instance will not implicitly convert the jobrun.sh script to run on multiple nodes (as it's presumably not coordinating multiple nodes to run a task like an MPI program). The application logic that is wrapped in jobrun.sh needs to be decomposed and parallelized as discrete tasks. There are helpers like task factories https://github.com/Azure/batch-shipyard/blob/master/docs/35-batch-shipyard-task-factory-merge-task.md that might be applicable, but otherwise this is something that needs to be performed on a per-application basis.

Perhaps looking at this recipe would be helpful: https://github.com/Azure/batch-shipyard/tree/master/recipes/BLAST-CPU

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/271#issuecomment-476832323, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy_wOaPo9Vrun0rmSGav9ZvdY9wf3Sks5vaoDfgaJpZM4cHZLb .

alfpark commented 5 years ago

Batch will handle the scheduling and placement of multiple tasks in a job to compute nodes that are awaiting task assignment.

Every Batch job is associated with a Batch pool. All tasks that are submitted to the active job are then scheduled against the pool. These tasks are assigned to compute nodes which have free scheduling slots.

As per above, if your job's tasks do not have an explicit id assigned, you can simply perform jobs add for each yaml file if you have a 1:1 job:task mapping. Or if your jobs.yaml have 1:n tasks then a singular jobs add call would add all tasks to the job and then get scheduled concurrently to the compute nodes in the pool if there are no dependencies blocking scheduling.

You don't need to concern yourself with distributing individual tasks across compute nodes - the Azure Batch service does that for you. Think of your job as a "queue" and each task as command processes in the queue being distributed to compute nodes with available scheduling slots. This scheduling process is handled entirely by Batch.

If you need coordinated execution of a "single" process across multiple compute nodes, i.e., MPI programs (distributed parallel processing), then that is an entirely different matter.

You need to evaluate in your scenario:

  1. Do you need to run a bunch of Snakemake Input->Process->Output workflows? If so, you can trivially parallelize these by just submitting a task for each of these workflows. Each task will run on a different compute node with default max tasks per node settings.
  2. Does each Process in the Snakemake Input->Process->Output workflow need to be parallelized across multiple machines? That is a much more complicated question that only you can answer with understanding your computation (e.g., is it fork-join parallelism, distributed parallel computation, etc.).