flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.65k stars 629 forks source link

[BUG] Unable to use `mlflow_autolog` due to package version issues #4728

Open mattv-neXenio opened 9 months ago

mattv-neXenio commented 9 months ago

Describe the bug

Adding the mlflow_autolog() decorator to the train_model task to example.py from flytesnacks, leads to the following AtrributeError for scikit-learn when running pyflyte run -p test-project example.py training_workflow --hyperparameters '{"C": 0.1}'

Failed with Unknown Exception <class 'AttributeError'> Reason: Encountered error while executing workflow 'example.training_workflow':
  Error encountered while executing 'training_workflow':
  module 'sklearn.metrics' has no attribute 'SCORERS'

The decorated task

@task()
@mlflow_autolog()
# @mlflow_autolog(framework=mlflow.sklearn)
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
    """Train a model on the wine dataset."""
    features = data.drop("target", axis="columns")
    target = data["target"]
    return LogisticRegression(max_iter=3000, **hyperparameters).fit(features, target)

same error with @mlflow_autolog(framework=mlflow.sklearn).

Minimum reproducible environment with python3.11

pip install flytekit flytekitplugins-envd flytekitplugins-mlflow scikit-learn with the following package versions being installed

flytekit==1.10.2
flytekitplugins-envd==1.10.2
flytekitplugins-mlflow==1.10.2
scikit-learn==1.3.2
mlflow==1.30.1

The entire pip freeze output:

adlfs==2023.10.0
aiobotocore==2.5.4
aiohttp==3.9.1
aioitertools==0.11.0
aiosignal==1.3.1
alembic==1.13.1
anyio==4.2.0
arrow==1.3.0
attrs==23.2.0
azure-core==1.29.6
azure-datalake-store==0.0.53
azure-identity==1.15.0
azure-storage-blob==12.19.0
binaryornot==0.4.4
blinker==1.7.0
botocore==1.31.17
cachetools==5.3.2
certifi==2023.11.17
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
cookiecutter==2.5.0
croniter==2.0.1
cryptography==41.0.7
databricks-cli==0.18.0
dataclasses-json==0.5.9
decorator==5.1.1
diskcache==5.6.3
docker==6.1.3
docstring-parser==0.15
entrypoints==0.4
envd==0.3.36
Flask==2.3.3
flyteidl==1.10.6
flytekit==1.10.2
flytekitplugins-envd==1.10.2
flytekitplugins-mlflow==1.10.2
frozenlist==1.4.1
fsspec==2023.9.2
gcsfs==2023.9.2
gitdb==4.0.11
GitPython==3.1.41
google-api-core==2.15.0
google-auth==2.26.2
google-auth-oauthlib==1.2.0
google-cloud-core==2.4.1
google-cloud-storage==2.14.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
greenlet==3.0.3
grpcio==1.60.0
grpcio-status==1.60.0
gunicorn==20.1.0
idna==3.6
importlib-metadata==5.2.0
isodate==0.6.1
itsdangerous==2.1.2
jaraco.classes==3.3.0
jeepney==0.8.0
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonpickle==3.0.2
keyring==24.3.0
kubernetes==29.0.0
Mako==1.3.0
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.2
marshmallow-enum==1.5.1
marshmallow-jsonschema==0.13.0
mashumaro==3.11
mdurl==0.1.2
mlflow==1.30.1
more-itertools==10.2.0
msal==1.26.0
msal-extensions==1.1.0
multidict==6.0.4
mypy-extensions==1.0.0
numpy==1.26.3
oauthlib==3.2.2
packaging==21.3
pandas==1.5.3
plotly==5.18.0
portalocker==2.8.2
prometheus-client==0.19.0
prometheus-flask-exporter==0.23.0
protobuf==4.24.4
protoc-gen-swagger==0.1.0
pyarrow==14.0.2
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
Pygments==2.17.2
PyJWT==2.8.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-slugify==8.0.1
pytimeparse==1.1.8
pytz==2022.7.1
PyYAML==6.0.1
querystring-parser==1.2.4
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.7.0
rich-click==1.7.3
rsa==4.9
s3fs==2023.9.2
scikit-learn==1.3.2
scipy==1.11.4
SecretStorage==3.3.3
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
SQLAlchemy==1.4.51
sqlparse==0.4.4
statsd==3.3.0
tabulate==0.9.0
tenacity==8.2.3
text-unidecode==1.3
threadpoolctl==3.2.0
types-python-dateutil==2.8.19.20240106
typing-inspect==0.9.0
typing_extensions==4.9.0
urllib3==1.26.18
websocket-client==1.7.0
Werkzeug==3.0.1
wrapt==1.16.0
yarl==1.9.4
zipp==3.17.0

Expected behavior

The AtrributeError shouldn't be raised and mlflow_autolog should work directly. At least compatible version of scikit-learn and mlflow should be requested.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 9 months ago

Thank you for opening your first issue here! 🛠

dosubot[bot] commented 9 months ago

🤖

Thank you for providing detailed information about the bug you're encountering. It seems like there might be a compatibility issue between the versions of mlflow and scikit-learn you're using.

To help me investigate this further, could you please provide the following additional information:

  1. Can you confirm if the error occurs only when you use the mlflow_autolog decorator? Does the train_model task run successfully without it?
  2. Have you tried using different versions of mlflow and scikit-learn? If so, could you please share the results?
  3. Could you provide the full stack trace of the error? This could give more context about where exactly the error is coming from.

This information will help me understand the issue better and find a solution. Thank you for your cooperation.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.