ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 124 forks source link

🐕 Batch: Define Model Installs through a YAML file instead of a Dockerfile #1159

Open DhanshreeA opened 1 month ago

DhanshreeA commented 1 month ago

Summary

This task is related to the new and improved packing strategy in Ersilia that aims to remove several anti-patterns in how Ersilia models are packaged currently (ref: #1137 #1138). In the present approach, a model's dependencies are specified as a Dockerfile which is then parsed through the Ersilia CLI to create a set of installation instructions for the model's environment. This approach is grossly misleading because this Dockerfile only serves as a dependency spec and not as a set of build instructions for the model's image.

Ersilia is moving away from this approach to utilizing YAML as the appropriate standard for dependency management.

Objective(s)

Documentation

No response

DhanshreeA commented 1 month ago

Work in Progress sample of such a YAML file:

python: 3.x.x # Numeric input separated by dot
dependencies:
    - conda: # list
        - conda -c channel dep1=x.x.x  # Version 1
        - ["rdkit", "conda-forge"] # Version 2
        - ["git-lfs", "0.1.1", "default"] # Version 3
    - pip
        - dep1=x.x.x # pip expects == however
        - dep2='^x.x.x'
        - dep3='~x.x.x'
        - dep4='<=x.x.x' # same for >=
        - ["dep", "x.x.x"]
    # We will have to urge the users to keep an order? Not sure
    # Also it'll be work to understand if there are any conda reqs in this list
    - conda install dep -c channel 
    - pip install dep==1.1.1 
# This is a combination of system-commands and dependencies
# We will know there is conda if there's a list with 4 elements
# Forces users to write version, and channel in the case of conda
commands:
    - any:
        - ["pip", "rdkit", "2023.09"]
        - ["pip", "openai", "x.x.x"]
        - ["conda", "git-lfs", "x.x.x", "conda-forge"]
        - "sudo apt-get ..."
        - ["pip", "pyairtable", "x.x.x"]
    - osx:
        - "...."
    - linux:
        - "..."
 commands:
  - osx-cpu:
      - ["pip", "pytorch", "1.1", "https://....-cpu..."]
  - osx-gpu:
      - ["pip", "pytorch", "1.1", "https://...-gpu..."]
platform: ['osx', 'linux'] # list with only these acceptable values - This needs to be more comprehensive, since we want to specify which linux - debian/suse whatever, and osx-intel or osx with m chips 
runtime/hardware/compute: ['cpu', 'gpu'] # This would lead to a matrix of depedencies for cpu and gpu

# We're better off calling this system commands or something that reflects that bec these are not 'extra'
# Type 1 is preferable to type 3
extra_requires_type1:
    - any:
        - ....
        - ....
    - osx:
        - cmd1
        - cmd2
    - linux:
        - cmd1 opt1 opt2
        - cmd2

extra_requires_type3:
    - osx: 
        - ["cmd1", "op1", "op2"]
        - ["cmd1", "op1", "op2"]
    - linux: 
        - ["cmd1", "op1", "op2"]
        - ["cmd1", "op1", "op2"]
DhanshreeA commented 3 weeks ago

This issue is basically a duplicate of #743

DhanshreeA commented 1 week ago

The standardized version of the YAML above is finalized to be the following for now, which we can expand upon later, namely, specifying hardware platform and dependencies specific to that:

python: 3.x.x
commands:
        - ["pip", "rdkit", "2023.09"]
        - ["pip", "openai", "x.x.x"]
        - ["conda", "git-lfs", "x.x.x", "conda-forge"]
        - "sudo apt-get ..."
        - ["pip", "pyairtable", "x.x.x"]
    - osx:
        - "...."
    - linux:
        - "..."

High level constraints:

  1. We enforce pip dependencies to be in a list containing 3 elements (pip, package, version)
  2. We enforce conda dependencies to be in a list containing 4 elements (conda, package, version, channel)