bitnami / containers

Bitnami container images
https://bitnami.com
Other
3.04k stars 4.4k forks source link

[bitnami/mlflow] MLFlow missing package: google-cloud-storage #65108

Open RussellSB opened 2 months ago

RussellSB commented 2 months ago

Name and Version

bitnami/mlflow

What architecture are you using?

amd64

What steps will reproduce the bug?

Right now we have MLFlow setup with GCP and relying on the bitnami image. Whenever we try log ML models to the tracking server it tries saving it under the hood to google cloud storage but fails due to missing package google-cloud-storage and its dependencies (google.auth included). To reproduce simply without having to setup the whole GCP server;

  1. Load mlflow bitnami image.
  2. Start python
  3. Interpret from google.auth.exceptions import DefaultCredentialsError (as per https://github.com/mlflow/mlflow/blob/master/mlflow/store/artifact/gcs_artifact_repo.py)

What is the expected behavior?

It imports correctly.

What do you see instead?

Traceback (most recent call last):
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 497, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 538, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 951, in _list_artifacts
    artifact_entities = _list_artifacts_for_proxied_run_artifact_root(
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 497, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 981, in _list_artifacts_for_proxied_run_artifact_root
    artifact_destination_repo = _get_artifact_repo_mlflow_artifacts()
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 175, in _get_artifact_repo_mlflow_artifacts
    _artifact_repo = get_artifact_repository(os.environ[ARTIFACTS_DESTINATION_ENV_VAR])
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 117, in get_artifact_repository
    return _artifact_repository_registry.get_artifact_repository(artifact_uri)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 74, in get_artifact_repository
    return repository(artifact_uri)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/gcs_artifact_repo.py", line 40, in __init__
    from google.auth.exceptions import DefaultCredentialsError
ModuleNotFoundError: No module named 'google.auth' 

Additional information

As a work around we install google-cloud-storage over the image everytime the server is connected to, but would be good to have it in built in the image since it is core functionality. Would open a PR but not sure where to install this missing package in the repo.

This also seems related; https://github.com/bitnami/charts/issues/22720

javsalgar commented 2 months ago

Hi!

Thank you so much for reporting. Indeed, these packages are missing. I created a task in our backlog to add these missing pip modules.

RussellSB commented 2 months ago

Great, thank you! Look forward to this,

RussellSB commented 1 month ago

Hey, are there any updates? Would be great to know how far down the roadmap this issue could be tackled.

dhrp commented 1 month ago

I've been trying to see if I can add this dependency but I'm running into a wall... I don't have any way to see how the stacksmith dependencies are built, or make changes to it.

I tried adding the google-cloud-sdk dependency; but to no avail. From what I can see nowhere in this repository pip install is actually used; so it must be enforced. Perhaps a maintainer can help me understand how to do this?

I tried the following: https://gist.github.com/dhrp/f5ad291ab9ab583e85da1bf930326d33

but it doesn't install the python SDK / the import does not work.

[edit] Actually; simply adding:

RUN pip install google-cloud-storage

to the end of the Dockerfile works. Would you be interested in a contribution like this? -- or should it really go into the stacksmith part?

pinging @javsalgar. I'm planning to also pick up https://github.com/bitnami/charts/issues/22720; but this is a dependency.

juan131 commented 1 week ago

Hi everyone

Could you please give it a try using the image tag 2.14.1-debian-12-r1? We included the missing Python module on this image revision.