tesResources requires addition of GPU, FPGA, Arch

ga4gh / task-execution-schemas

Apache License 2.0

82 stars 29 forks source link

tesResources requires addition of GPU, FPGA, Arch #163

Open markjschreiber opened 3 years ago

markjschreiber commented 3 years ago

An increasing number of genomics tasks require the presence of additional resource types such as GPU, FPGA and the option of using aarch64 architecture. If there is not already a way to add these, and if people think they should be added, I am happy to create a PR.

uniqueg commented 3 years ago

Also requested by Czech ELIXIR node, see here: https://docs.google.com/spreadsheets/d/1vBFhBQ-nFqhSL5dLjQfOWO6x9BzmV9x6l18p9GYRZdQ/edit#gid=0

Contact: @xhejtman & @viktoriaas

kellrott commented 3 years ago

This is the intention of #154 and is currently scheduled to be part of v1.1

vsmalladi commented 1 year ago

@kellrott can we close this now that #154 is merged?

kellrott commented 1 year ago

backend_parameters is a step to solving this problem, but it is vendor specific. So you need to know the special string that maps to your deployments resources. We need to continue thinking about ways to solve this in a more coherent fashion. It would probably be a bad idea to hard code computer parts directly into the OpenAPI specification, because every time a new product is released the schema would have to be updated.

One idea would be to separate hardware descriptions into a schema, like something described at https://schema.org/, and describe various product relationships using JSON-LD. So you could make statements like "I need at least an NVIdia A100 or better with at least 24GB of on card RAM". And the JSON-LD that describes the technical landscape could evolve as new products are introduced.

xhejtman commented 1 year ago

E.g., nextflow uses term accelerator: number, however, it also allows to specify card type that matches node annotations in kubernetes. I think there is no way to deal with all types of accelerators and executions evironments and be vendor agnostic at the same time. E.g., if requesting A100 or better, is MI300 better (AMD)? And I agree that relative comparator is really useful, like at least something and not only supporting exact match. Cuda offers some numbers as Cuda compute capability, but this is again limited to NVIDIA/Cuda world only.

If we can assume that all accelerators have RAM, then number of accelerators and RAM could be in spec, but the type of accelerator is quite a mess.