Galileo-Galilei / kedro-pandera

A kedro plugin to use pandera in your kedro projects
https://kedro-pandera.readthedocs.io/en/latest/
Apache License 2.0
33 stars 4 forks source link

Validating output data fails on a MemoryDataset #69

Closed michal-mmm closed 1 month ago

michal-mmm commented 1 month ago

Description

Latest change validating output data (after_node_run()) fails on the MemoryDataset output. catalog._datasets doesn't contain MemoryDataset.

Steps to Reproduce

  1. Use spaceflights-pandas starter
  2. Install kedro-pandera from main
  3. Run the code.

Expected Result

No error

Actual Result

INFO     Running node: split_data_node:                                                    node.py:361
                             split_data([model_input_table;params:model_options]) ->
                             [X_train;X_test;y_train;y_test]
                    WARNING  There are 3 nodes that have not run.                                            runner.py:214
                             You can resume the pipeline run from the nearest nodes with persisted inputs by
                             adding the following argument to your previous command:
                               --from-nodes "split_data_node"
Traceback (most recent call last):
  File "/Desktop/Projects/new-spaceflights/.venv/bin/kedro", line 8, in <module>
    sys.exit(main())
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/kedro/framework/cli/cli.py", line 233, in main
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 477, in <lambda>
    lambda: oldcall(hook_name, hook_impls, caller_kwargs, firstresult)
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/kedro_pandera/framework/hooks/pandera_hook.py", line 95, in after_node_run
    self._validate_datasets(node, catalog, outputs)
  File "/Desktop/Projects/new-spaceflights/.venv/lib/python3.10/site-packages/kedro_pandera/framework/hooks/pandera_hook.py", line 58, in _validate_datasets

metadata = getattr(catalog._datasets[name], "metadata", None)
KeyError: 'X_train'

Your Environment

Python 3.10.13 kedro 0.19.6

Galileo-Galilei commented 1 month ago

Closed by #70