Open samhita-alla opened 1 year ago
Hi. Can I work on this issue?
Yup, assigned to you, thanks!
Hello @pingsutw , please assign this issue to me as I have already worked on this kind of problem in past and has a great experience.
@pingsutw I'm still working on it
I'm having trouble getting the plugin to behave as described. @pingsutw perhaps you can offer some guidance?
I'm using Python 3.9.11 on macos. These are the python dependencies I've installed:
adlfs==2023.9.0
aiobotocore==2.5.4
aiohttp==3.8.5
aioitertools==0.11.0
aiosignal==1.3.1
annotated-types==0.5.0
appnope==0.1.3
arrow==1.3.0
asttokens==2.4.0
async-timeout==4.0.3
attrs==23.1.0
azure-core==1.29.4
azure-datalake-store==0.0.53
azure-identity==1.14.0
azure-storage-blob==12.18.2
backcall==0.2.0
binaryornot==0.4.4
botocore==1.31.17
cachetools==5.3.1
certifi==2023.7.22
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.0
click==8.1.7
cloudpickle==2.2.1
comm==0.1.4
contourpy==1.1.1
cookiecutter==2.4.0
croniter==1.4.1
cryptography==41.0.4
cycler==0.12.0
dacite==1.8.1
dataclasses-json==0.5.9
decorator==5.1.1
Deprecated==1.2.14
diskcache==5.6.3
docker==6.1.3
docker-image-py==0.1.12
docstring-parser==0.15
exceptiongroup==1.1.3
executing==2.0.0
flyteidl==1.5.21
flytekit==1.9.1
flytekitplugins-deck-standard==1.9.1
flytekitplugins-pydantic==1.10.0b0
fonttools==4.43.0
frozenlist==1.4.0
fsspec==2023.9.2
gcsfs==2023.9.2
gitdb==4.0.10
GitPython==3.1.37
google-api-core==2.12.0
google-auth==2.23.2
google-auth-oauthlib==1.1.0
google-cloud-core==2.3.3
google-cloud-storage==2.11.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.60.0
grpcio==1.53.0
grpcio-status==1.53.0
htmlmin==0.1.12
idna==3.4
ImageHash==4.3.1
importlib-metadata==6.8.0
importlib-resources==6.1.0
ipython==8.16.1
ipywidgets==8.1.1
isodate==0.6.1
jaraco.classes==3.3.0
jedi==0.19.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
jsonpickle==3.0.2
jupyterlab-widgets==3.0.9
keyring==24.2.0
kiwisolver==1.4.5
kubernetes==28.1.0
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-enum==1.5.1
marshmallow-jsonschema==0.13.0
matplotlib==3.8.0
matplotlib-inline==0.1.6
mdurl==0.1.2
more-itertools==10.1.0
msal==1.24.1
msal-extensions==1.0.0
multidict==6.0.4
multimethod==1.10
mypy-extensions==1.0.0
natsort==8.4.0
networkx==3.1
numpy==1.23.5
oauthlib==3.2.2
packaging==23.2
pandas==1.5.3
parso==0.8.3
patsy==0.5.3
pexpect==4.8.0
phik==0.12.3
pickleshare==0.7.5
Pillow==10.0.1
plotly==5.17.0
portalocker==2.8.2
prompt-toolkit==3.0.39
protobuf==4.24.3
protoc-gen-swagger==0.1.0
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==10.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==1.10.13
pydantic_core==2.10.1
Pygments==2.16.1
PyJWT==2.8.0
pyOpenSSL==23.2.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-slugify==8.0.1
pytimeparse==1.1.8
pytz==2023.3.post1
PyWavelets==1.4.1
PyYAML==6.0.1
regex==2023.8.8
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.6.0
rich-click==1.6.1
rsa==4.9
s3fs==2023.9.2
scikit-learn==1.3.1
scipy==1.11.3
seaborn==0.12.2
six==1.16.0
smmap==5.0.1
sortedcontainers==2.4.0
stack-data==0.6.3
statsd==3.3.0
statsmodels==0.14.0
tangled-up-in-unicode==0.2.0
tenacity==8.2.3
text-unidecode==1.3
threadpoolctl==3.2.0
tqdm==4.66.1
traitlets==5.10.1
typeguard==2.13.3
types-python-dateutil==2.8.19.14
typing-inspect==0.9.0
typing_extensions==4.8.0
urllib3==1.26.17
visions==0.7.5
wcwidth==0.2.8
websocket-client==1.6.3
widgetsnbextension==4.0.9
wordcloud==1.9.2
wrapt==1.15.0
yarl==1.9.2
ydata-profiling==4.5.1
zipp==3.17.0
I also installed flytectl
with
brew install flyteorg/homebrew-tap/flytectl
I created this short script similar to the flytekit docs. I verified that I could get the example in the docs working, and then modified it to use the pydantic plugin.
# train_logistic_regression.py
from pydantic import BaseModel
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
from flytekit import task, workflow
class Config(BaseModel):
C: float = 1.0
max_iter: int = 100
@task
def get_data() -> pd.DataFrame:
"""Get the wine dataset."""
return load_wine(as_frame=True).frame
@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
"""Simplify the task from a 3-class to a binary classification problem."""
return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))
@task
def train_model(data: pd.DataFrame, config: Config) -> LogisticRegression:
"""Train a model on the wine dataset."""
features = data.drop("target", axis="columns")
target = data["target"]
return LogisticRegression(**config.dict()).fit(features, target)
@workflow
def training_workflow(config: Config) -> LogisticRegression:
"""Put all of the steps together into a single workflow."""
data = get_data()
processed_data = process_data(data=data)
return train_model(
data=processed_data,
config=config,
)
I am running the script with the command
pyflyte run train_logistic_regression.py training_workflow --config '{"C": 0.5, "max_iter": 1000}'
I then get this error:
Invalid value for '--config': Failed to convert param <Option config>, {'C': 0.5, 'max_iter': 1000} to <class 'train_logistic_regression.Config'>
@galbwe, could you initialize config in the workflow itself, i.e. provide a default value to config
?
@samhita-alla I changed the workflow definition to
@workflow
def training_workflow(config: Config = Config()) -> LogisticRegression:
"""Put all of the steps together into a single workflow."""
data = get_data()
processed_data = process_data(data=data)
return train_model(
data=processed_data,
config=config,
)
then ran
pyflyte run train_logistic_regression.py training_workflow
Now I'm getting this error:
Failed with Unknown Exception <class 'AttributeError'> Reason: 'Config' object has no attribute 'to_json'
Then I tried adding an empty to_json
method to Config
just to make it happy, ran the same command, and got
Missing option '--config'.
So it seems like --config
is required even when a default option is specified.
Then I tried hard coding the pydantic model in the workflow and that seemed to work.
@workflow
def training_workflow() -> LogisticRegression:
"""Put all of the steps together into a single workflow."""
config = Config(C=0.1, max_iter=1000)
data = get_data()
processed_data = process_data(data=data)
return train_model(
data=processed_data,
config=config,
)
pyflyte run train_logistic_regression.py training_workflow
LogisticRegression(C=0.1, max_iter=1000)
I guess I was misled by this test that passes pydantic models directly to workflows. I think a Promise
is being passed to the workflow instead of a model when pyflyte
is called.
@galbwe, def training_workflow(config: Config = Config()) -> LogisticRegression:
has to work. Can you send me the full stack trace? I'm wondering if the plugin is being used.
Sure, here is the verbose output. Thanks.
% pyflyte --verbose run ./pydantic_plugin/train_logistic_regression.py training_workflow
2023-10-04 12:41:49,366938 WARNING {"asctime": "2023-10-04 12:41:49,366", "name": "flytekit", "levelname": "WARNING", "message": "Unsupported Type <class 'sklearn.linear_model._logistic.LogisticRegression'> found, Flyte will default to type_engine.py:1141
use PickleFile as the transport. Pickle can only be used to send objects between the exact same version of Python, and we strongly recommend to use python type that flyte support."}
Verbose mode on
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/bin/pyflyte:8 in <module> │
│ │
│ ❱ 8 │ sys.exit(main()) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1157 in __call__ │
│ │
│ ❱ 1157 │ │ return self.main(*args, **kwargs) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/rich_click/rich_group.py:21 in main │
│ │
│ ❱ 21 │ │ │ rv = super().main(*args, standalone_mode=False, **kwargs) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1078 in main │
│ │
│ ❱ 1078 │ │ │ │ │ rv = self.invoke(ctx) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/pyflyte.py:87 in invoke │
│ │
│ ❱ 87 │ │ │ │ raise e │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/pyflyte.py:83 in invoke │
│ │
│ ❱ 83 │ │ │ return super().invoke(ctx) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1688 in invoke │
│ │
│ ❱ 1688 │ │ │ │ │ return _process_result(sub_ctx.command.invoke(sub_ctx)) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1688 in invoke │
│ │
│ ❱ 1688 │ │ │ │ │ return _process_result(sub_ctx.command.invoke(sub_ctx)) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1682 in invoke │
│ │
│ ❱ 1682 │ │ │ │ cmd_name, cmd, args = self.resolve_command(ctx, args) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/click/core.py:1729 in resolve_command │
│ │
│ ❱ 1729 │ │ cmd = self.get_command(ctx, cmd_name) │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/run.py:810 in get_command │
│ │
│ ❱ 810 │ │ │ │ to_click_option(ctx, flyte_ctx, input_name, literal_var, python_type, de │
│ │
│ /Users/wes/code/projects/flyteorg/flytesnacks/examples/pydantic_plugin/venv/lib/python3.9/site-packages/flytekit/clis/sdk_in_container/run.py:478 in to_click_option │
│ │
│ ❱ 478 │ │ │ │ default_val = cast(DataClassJsonMixin, default_val).to_json() │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Config' object has no attribute 'to_json'
@pingsutw should this work?
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏
Hello @davidmirror-ops
I want to work on this issue, can you please assign this to me?
@sumana-2705 Thanks, looking forward to your contributions!
@samhita-alla @davidmirror-ops
Does the example page refer only to the .py file containing the Pydantic integration example, or should I include any additional files as well?
@sumana-2705 you'll need to add two pages, similar to the integration examples in the docs, like this one: https://docs.flyte.org/en/latest/flytesnacks/examples/ollama_plugin/index.html
Hello @samhita-alla, @davidmirror-ops,
I have opened a pull request for this issue. Could you please review it and provide any necessary feedback or suggestions for changes. Thank you.
Description
The Pydantic integration currently does not have an example page under the integrations section: https://github.com/flyteorg/flytesnacks/tree/master/examples
The purpose of this task is to add (a) a page describing the plugin and how to install it (see here) and (b) an example page on how to use it (see here)
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?