GSTT-CSC / MLOps

Framework for building ML apps
GNU General Public License v3.0
9 stars 5 forks source link

macOs doesn't seem to be supported #14

Closed hshuaib90 closed 3 years ago

hshuaib90 commented 3 years ago

When running an experiment I get the following error:

running experiment...
2021/06/06 00:56:46 INFO mlflow.projects.docker: === Building docker image example_project:da1093a ===
2021/06/06 00:57:07 INFO mlflow.projects.utils: === Created directory /var/folders/5f/gtpk4fdd3558tf0fdfv4l98h0000gr/T/tmp6j02pqql for downloading remote URIs passed to arguments of type 'path' ===
2021/06/06 00:57:07 INFO mlflow.projects.backend.local: === Running command 'docker run --rm --network host --gpus all --ipc host --rm  --runtime nvidia -v /home/user/datadir:/DATA -e MLFLOW_RUN_ID=9e1ca857e3d14281acc5785ae44577fb -e MLFLOW_TRACKING_URI=http://0.0.0.0:80 -e MLFLOW_EXPERIMENT_ID=1 -e AWS_SECRET_ACCESS_KEY=minioadmin -e AWS_ACCESS_KEY_ID=minioadmin -e MLFLOW_S3_ENDPOINT_URL=http://0.0.0.0:8002 example_project:da1093a python3 train.py' in run with ID '9e1ca857e3d14281acc5785ae44577fb' ===
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
Traceback (most recent call last):
  File "run_project.py", line 24, in <module>
    run_project(args)
  File "run_project.py", line 11, in run_project
    entry_point=in_args.entry_point)
  File "/Users/mohammadharisshuaib/Software/mlops_test/mlops_env/lib/python3.7/site-packages/mlops-0.1-py3.7.egg/mlops/Experiment.py", line 121, in run
  File "/Users/mohammadharisshuaib/Software/mlops_test/mlops_env/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 307, in run
    _wait_for(submitted_run_obj)
  File "/Users/mohammadharisshuaib/Software/mlops_test/mlops_env/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 324, in _wait_for
    raise ExecutionException("Run (ID '%s') failed" % run_id)
mlflow.exceptions.ExecutionException: Run (ID '9e1ca857e3d14281acc5785ae44577fb') failed

This link seems to indicate that macOS is not supported: https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#why-do-i-get-the-error-unknown-runtime-specified-nvidia

Is that the case?

laurencejackson commented 3 years ago

Closing this issue since this has been resolved in recent commits. Issue was that the Docker runtime was specified as nvidia by in the mlflow run command which prevented it running on systems without that runtime specified.

Resolved by removing runtime as an specified docker argument tomlflow.run and setting it as the default at the OS level on the DGX.