Sage-Bionetworks / research-benchmarking-technology

Main repository of the Research & Benchmarking Technology Team
Apache License 2.0
1 stars 1 forks source link

Explore the benchmarking of Tools as workflows #52

Open tschaffter opened 2 years ago

tschaffter commented 2 years ago

Both programs and API services have limitations that we are capturing in #51.

This Epic aims to explore the implication of defining a Tool as a workflow. The motivation is that workflow are more and more used to perform biomedical tasks. Sage recently ramped up effort in that domain by settings up a Nextflow Tower.

In my mind, here are the few pros and cons of workflow:

Pros:

Cons:

Tasks:

Resources

thomasyu888 commented 2 years ago

Here are some additions to the pros and cons list you have above for the workflow submissions:

Pros:

Cons:

tschaffter commented 2 years ago

Steep learning curve (Similar curve to API based submissions + Docker + GitHub in my opinion)

Not sure why Docker and GitHub are part of the equation. Isn't Docker required when developing portable workflows (required in the context of benchmarking).

Having multiple steps to the workflow means there is possibly many Docker containers. How will we support private Docker containers?

Do you mean private images? How is this linked to the number of containers to run?

thomasyu888 commented 2 years ago

Not sure why Docker and GitHub are part of the equation. Isn't Docker required when developing portable workflows (required in the context of benchmarking).

Docker and GitHub are part of the learning curve because not everyone uses them. GitHub is added because if we have template workflows or template API repos for people to pull from, we basically add GitHub as a dependency. If people don't know how to use GitHub, that is new technology they have to learn. I think lots of times we think we are making lives easier for participants, but maybe we really aren't.

Do you mean private images? How is this linked to the number of containers to run?

Not necessarily. Lets say in a workflow you have 5 steps. Each of the steps use a different Docker images. Regardless of whether each image is public or private, we would need the ability to: