flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.75k stars 654 forks source link

[BUG] [flytekit] Improve handling around StructuredDataset and other lossy types in Flyte remote #2502

Open wild-endeavor opened 2 years ago

wild-endeavor commented 2 years ago

Description

If you remote.fetch a task or workflow or launch plan where one of the inputs is a StructuredDataset, and then try to execute it, flytekit will try to "guess" the interface for that structured dataset input and the type that it will come up with is the Python/flytekit StructuredDataset class. This is correct, but when we go and try to create the execution, we need to translate the dataframe from a pd.DataFrame or whatever instance into a StructuredDataset Literal. Since flytekit thinks the type annotation is a Python StructuredDataset, it will try to look it up in the list of formats/encoders it has and fail because it's not a real dataframe type.

An example stack trace:

Traceback (most recent call last):
  File "/Users/nielsbantilan/miniconda3/envs/unionml/bin/unionml", line 33, in <module>
    sys.exit(load_entry_point('unionml', 'console_scripts', 'unionml')())
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/nielsbantilan/git/unionml/unionml/cli.py", line 99, in predict
    predictions = model.remote_predict(app_version, model_version, wait=True, **prediction_inputs)
  File "/Users/nielsbantilan/git/unionml/unionml/model.py", line 535, in remote_predict
    execution = self._remote.execute(
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/remote/remote.py", line 796, in execute
    return self.execute_remote_wf(
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/remote/remote.py", line 889, in execute_remote_wf
    return self.execute_remote_task_lp(
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/remote/remote.py", line 862, in execute_remote_task_lp
    return self._execute(
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/remote/remote.py", line 658, in _execute
    lit = TypeEngine.to_literal(ctx, v, hint, variable.type)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 696, in to_literal
    lv = transformer.to_literal(ctx, python_val, python_type, expected)
  File "/Users/nielsbantilan/miniconda3/envs/unionml/lib/python3.9/site-packages/flytekit/types/structured/structured_dataset.py", line 486, in to_literal
    fmt = self.DEFAULT_FORMATS[python_type]
KeyError: <class 'flytekit.types.structured.structured_dataset.StructuredDataset'>

We need to improve the erroring/experience around this. Potential things include:

Misc

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

ggydush-fn commented 2 years ago

Also seeing this issue when using remote.execute if input contains a pd.DataFrame object. It's resolved when wrapping the dataframe with a StructuredDataset

github-actions[bot] commented 1 year ago

Hello šŸ‘‹, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! šŸ™

github-actions[bot] commented 1 year ago

Hello šŸ‘‹, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! šŸ™

github-actions[bot] commented 1 month ago

Hello šŸ‘‹, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! šŸ™