flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.78k stars 659 forks source link

[Docs] pydantic integration: create flytesnacks example page #4066

Open samhita-alla opened 1 year ago

samhita-alla commented 1 year ago

Description

The Pydantic integration currently does not have an example page under the integrations section: https://github.com/flyteorg/flytesnacks/tree/master/examples

The purpose of this task is to add (a) a page describing the plugin and how to install it (see here) and (b) an example page on how to use it (see here)

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

galbwe commented 1 year ago

Hi. Can I work on this issue?

pingsutw commented 1 year ago

Yup, assigned to you, thanks!

itssiddhantjain commented 1 year ago

Hello @pingsutw , please assign this issue to me as I have already worked on this kind of problem in past and has a great experience.

galbwe commented 1 year ago

@pingsutw I'm still working on it

galbwe commented 1 year ago

I'm having trouble getting the plugin to behave as described. @pingsutw perhaps you can offer some guidance?

I'm using Python 3.9.11 on macos. These are the python dependencies I've installed:

adlfs==2023.9.0
aiobotocore==2.5.4
aiohttp==3.8.5
aioitertools==0.11.0
aiosignal==1.3.1
annotated-types==0.5.0
appnope==0.1.3
arrow==1.3.0
asttokens==2.4.0
async-timeout==4.0.3
attrs==23.1.0
azure-core==1.29.4
azure-datalake-store==0.0.53
azure-identity==1.14.0
azure-storage-blob==12.18.2
backcall==0.2.0
binaryornot==0.4.4
botocore==1.31.17
cachetools==5.3.1
certifi==2023.7.22
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.0
click==8.1.7
cloudpickle==2.2.1
comm==0.1.4
contourpy==1.1.1
cookiecutter==2.4.0
croniter==1.4.1
cryptography==41.0.4
cycler==0.12.0
dacite==1.8.1
dataclasses-json==0.5.9
decorator==5.1.1
Deprecated==1.2.14
diskcache==5.6.3
docker==6.1.3
docker-image-py==0.1.12
docstring-parser==0.15
exceptiongroup==1.1.3
executing==2.0.0
flyteidl==1.5.21
flytekit==1.9.1
flytekitplugins-deck-standard==1.9.1
flytekitplugins-pydantic==1.10.0b0
fonttools==4.43.0
frozenlist==1.4.0
fsspec==2023.9.2
gcsfs==2023.9.2
gitdb==4.0.10
GitPython==3.1.37
google-api-core==2.12.0
google-auth==2.23.2
google-auth-oauthlib==1.1.0
google-cloud-core==2.3.3
google-cloud-storage==2.11.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.60.0
grpcio==1.53.0
grpcio-status==1.53.0
htmlmin==0.1.12
idna==3.4
ImageHash==4.3.1
importlib-metadata==6.8.0
importlib-resources==6.1.0
ipython==8.16.1
ipywidgets==8.1.1
isodate==0.6.1
jaraco.classes==3.3.0
jedi==0.19.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
jsonpickle==3.0.2
jupyterlab-widgets==3.0.9
keyring==24.2.0
kiwisolver==1.4.5
kubernetes==28.1.0
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-enum==1.5.1
marshmallow-jsonschema==0.13.0
matplotlib==3.8.0
matplotlib-inline==0.1.6
mdurl==0.1.2
more-itertools==10.1.0
msal==1.24.1
msal-extensions==1.0.0
multidict==6.0.4
multimethod==1.10
mypy-extensions==1.0.0
natsort==8.4.0
networkx==3.1
numpy==1.23.5
oauthlib==3.2.2
packaging==23.2
pandas==1.5.3
parso==0.8.3
patsy==0.5.3
pexpect==4.8.0
phik==0.12.3
pickleshare==0.7.5
Pillow==10.0.1
plotly==5.17.0
portalocker==2.8.2
prompt-toolkit==3.0.39
protobuf==4.24.3
protoc-gen-swagger==0.1.0
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==10.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==1.10.13
pydantic_core==2.10.1
Pygments==2.16.1
PyJWT==2.8.0
pyOpenSSL==23.2.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-slugify==8.0.1
pytimeparse==1.1.8
pytz==2023.3.post1
PyWavelets==1.4.1
PyYAML==6.0.1
regex==2023.8.8
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.6.0
rich-click==1.6.1
rsa==4.9
s3fs==2023.9.2
scikit-learn==1.3.1
scipy==1.11.3
seaborn==0.12.2
six==1.16.0
smmap==5.0.1
sortedcontainers==2.4.0
stack-data==0.6.3
statsd==3.3.0
statsmodels==0.14.0
tangled-up-in-unicode==0.2.0
tenacity==8.2.3
text-unidecode==1.3
threadpoolctl==3.2.0
tqdm==4.66.1
traitlets==5.10.1
typeguard==2.13.3
types-python-dateutil==2.8.19.14
typing-inspect==0.9.0
typing_extensions==4.8.0
urllib3==1.26.17
visions==0.7.5
wcwidth==0.2.8
websocket-client==1.6.3
widgetsnbextension==4.0.9
wordcloud==1.9.2
wrapt==1.15.0
yarl==1.9.2
ydata-profiling==4.5.1
zipp==3.17.0

I also installed flytectl with

brew install flyteorg/homebrew-tap/flytectl

I created this short script similar to the flytekit docs. I verified that I could get the example in the docs working, and then modified it to use the pydantic plugin.

# train_logistic_regression.py
from pydantic import BaseModel

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression

from flytekit import task, workflow

class Config(BaseModel):
    C: float = 1.0
    max_iter: int = 100

@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    """Simplify the task from a 3-class to a binary classification problem."""
    return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))

@task
def train_model(data: pd.DataFrame, config: Config) -> LogisticRegression:
    """Train a model on the wine dataset."""
    features = data.drop("target", axis="columns")
    target = data["target"]
    return LogisticRegression(**config.dict()).fit(features, target)

@workflow
def training_workflow(config: Config) -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        config=config,
    )

I am running the script with the command

pyflyte run train_logistic_regression.py training_workflow --config '{"C": 0.5, "max_iter": 1000}'

I then get this error:

 Invalid value for '--config': Failed to convert param <Option config>, {'C': 0.5, 'max_iter': 1000} to <class 'train_logistic_regression.Config'>
samhita-alla commented 1 year ago

@galbwe, could you initialize config in the workflow itself, i.e. provide a default value to config?

galbwe commented 1 year ago

@samhita-alla I changed the workflow definition to

@workflow
def training_workflow(config: Config = Config()) -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        config=config,
    )

then ran

pyflyte run train_logistic_regression.py training_workflow

Now I'm getting this error:

Failed with Unknown Exception <class 'AttributeError'> Reason: 'Config' object has no attribute 'to_json'

Then I tried adding an empty to_json method to Config just to make it happy, ran the same command, and got

 Missing option '--config'.  

So it seems like --config is required even when a default option is specified.

Then I tried hard coding the pydantic model in the workflow and that seemed to work.

@workflow
def training_workflow() -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    config = Config(C=0.1, max_iter=1000)
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        config=config,
    )
pyflyte run train_logistic_regression.py training_workflow
LogisticRegression(C=0.1, max_iter=1000)
galbwe commented 1 year ago

I guess I was misled by this test that passes pydantic models directly to workflows. I think a Promise is being passed to the workflow instead of a model when pyflyte is called.

samhita-alla commented 1 year ago

@galbwe, def training_workflow(config: Config = Config()) -> LogisticRegression: has to work. Can you send me the full stack trace? I'm wondering if the plugin is being used.

galbwe commented 1 year ago

Sure, here is the verbose output. Thanks.

 % pyflyte --verbose run ./pydantic_plugin/train_logistic_regression.py training_workflow

2023-10-04 12:41:49,366938 WARNING  {"asctime": "2023-10-04 12:41:49,366", "name": "flytekit", "levelname": "WARNING", "message": "Unsupported Type <class 'sklearn.linear_model._logistic.LogisticRegression'> found, Flyte will default to  type_engine.py:1141
                                    use PickleFile as the transport. Pickle can only be used to send objects between the exact same version of Python, and we strongly recommend to use python type that flyte support."}                                        
Verbose mode on
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/bin/pyflyte:8 in <module>                                                                                                                                                         │
│                                                                                                                                                                                                                                                               │
│ ❱ 8 │   sys.exit(main())                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1157 in __call__                                                                                                                        │
│                                                                                                                                                                                                                                                               │
│ ❱ 1157 │   │   return self.main(*args, **kwargs)                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/rich_click/rich_group.py:21 in main                                                                                                                   │
│                                                                                                                                                                                                                                                               │
│ ❱ 21 │   │   │   rv = super().main(*args, standalone_mode=False, **kwargs)                                                                                                                                                                                    │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1078 in main                                                                                                                            │
│                                                                                                                                                                                                                                                               │
│ ❱ 1078 │   │   │   │   │   rv = self.invoke(ctx)                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/pyflyte.py:87 in invoke                                                                                                │
│                                                                                                                                                                                                                                                               │
│ ❱  87 │   │   │   │   raise e                                                                                                                                                                                                                                 │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/pyflyte.py:83 in invoke                                                                                                │
│                                                                                                                                                                                                                                                               │
│ ❱  83 │   │   │   return super().invoke(ctx)                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1688 in invoke                                                                                                                          │
│                                                                                                                                                                                                                                                               │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                                                                                                                                                                            │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1688 in invoke                                                                                                                          │
│                                                                                                                                                                                                                                                               │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                                                                                                                                                                            │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1682 in invoke                                                                                                                          │
│                                                                                                                                                                                                                                                               │
│ ❱ 1682 │   │   │   │   cmd_name, cmd, args = self.resolve_command(ctx, args)                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1729 in resolve_command                                                                                                                 │
│                                                                                                                                                                                                                                                               │
│ ❱ 1729 │   │   cmd = self.get_command(ctx, cmd_name)                                                                                                                                                                                                          │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/run.py:810 in get_command                                                                                              │
│                                                                                                                                                                                                                                                               │
│ ❱ 810 │   │   │   │   to_click_option(ctx, flyte_ctx, input_name, literal_var, python_type, de                                                                                                                                                                │
│                                                                                                                                                                                                                                                               │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/run.py:478 in to_click_option                                                                                          │
│                                                                                                                                                                                                                                                               │
│ ❱ 478 │   │   │   │   default_val = cast(DataClassJsonMixin, default_val).to_json()                                                                                                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Config' object has no attribute 'to_json'
samhita-alla commented 1 year ago

@pingsutw should this work?

github-actions[bot] commented 4 months ago

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏

sumana-2705 commented 1 month ago

Hello @davidmirror-ops
I want to work on this issue, can you please assign this to me?

davidmirror-ops commented 1 month ago

@sumana-2705 Thanks, looking forward to your contributions!

sumana-2705 commented 1 month ago

@samhita-alla @davidmirror-ops

Does the example page refer only to the .py file containing the Pydantic integration example, or should I include any additional files as well?

samhita-alla commented 1 month ago

@sumana-2705 you'll need to add two pages, similar to the integration examples in the docs, like this one: https://docs.flyte.org/en/latest/flytesnacks/examples/ollama_plugin/index.html

sumana-2705 commented 3 weeks ago

Hello @samhita-alla, @davidmirror-ops,

I have opened a pull request for this issue. Could you please review it and provide any necessary feedback or suggestions for changes. Thank you.