cms-dpoa / cloud-processing

Exploring the usage of public cloud resources for CMS open data processing
GNU General Public License v3.0
0 stars 0 forks source link

Learn about workflow management tools #2

Closed katilp closed 3 months ago

katilp commented 4 months ago

Do this after #3

The data processing task is a simple workflow that consists of finding the necessary metadata for the main processing step. The main processing steps are run as parallel jobs.

Basics

Learn about basics in https://coderefinery.github.io/reproducible-research/workflow-management/ Note however that we do not use snakemake in our use case but this is good overview of the topic

Argo

Currently, for the processing jobs on the Kubernetes clusters, we have used argo worklfows.

Read through our previous cloud computing intro in

subash-taranga commented 3 months ago

https://cms-opendata-workshop.github.io/workshop2023-lesson-cloud/

to perform this task I think we have to have already created cluster. but how do we know specifications in order to create this special clusters... i think there are many clusters need to be ready before start this exercise

katilp commented 3 months ago

Sorry, I forgot to specify that these instructions were used on a special occasion where several people were doing the exercise and we created all those clusters for them. For you, just create one cluster and the nfs disk, install argo cli and start working from https://cms-opendata-workshop.github.io/workshop2023-lesson-cloud/01-introduction/index.html#submit-the-workflow

Mind that argo workflows cli version might have changed, check under Linux in https://github.com/argoproj/argo-workflows/releases/

subash-taranga commented 3 months ago

@katilp ...............Great...............