Galileo-Galilei / kedro-pandera

A kedro plugin to use pandera in your kedro projects
https://kedro-pandera.readthedocs.io/en/latest/
Apache License 2.0
33 stars 4 forks source link

Add resolve_dataframe_model resolver #32

Closed noklam closed 11 months ago

noklam commented 12 months ago

Description

Why was this PR created? Support the class-basedDataFrameModel schema which is supported by pandera.

Development notes

What have you changed, and how has this been tested?

Questions:

How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

Checklist

Notice

noklam commented 11 months ago

i have some questions but are not related to the PR directly.

  1. kedro pandera infer -d example_iris_data # current. Should we make dataset a must instead of options? kedro pandera infer example_iris_data should work.

  2. How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.67% :tada:

Comparison is base (2f5d4bc) 88.54% compared to head (ebe3654) 89.21%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #32 +/- ## ========================================== + Coverage 88.54% 89.21% +0.67% ========================================== Files 5 5 Lines 96 102 +6 ========================================== + Hits 85 91 +6 Misses 11 11 ``` | [Files Changed](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9) | Coverage Δ | | |---|---|---| | [kedro\_pandera/framework/hooks/pandera\_hook.py](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9#diff-a2Vkcm9fcGFuZGVyYS9mcmFtZXdvcmsvaG9va3MvcGFuZGVyYV9ob29rLnB5) | `57.69% <ø> (ø)` | | | [kedro\_pandera/framework/config/resolvers.py](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9#diff-a2Vkcm9fcGFuZGVyYS9mcmFtZXdvcmsvY29uZmlnL3Jlc29sdmVycy5weQ==) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Galileo-Galilei commented 11 months ago

kedro pandera infer -d example_iris_data # current. Should we make dataset a must instead of options? kedro pandera infer example_iris_data should work.

Good catch, I think you are right this should not be an option. That said I guess one day we could infer all the datasets of an entire pipeline and use a -p option, I am not sure if we will create another command. Not a big deal for now, feel free to change it if you want.

How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

Yes. Testing is a pain in the a** because we need to mock an entire kedro repo with a context if we want to have things realistic. I guess a better test would be to check that the mock has called the importlib module function with assert_call_once method. We need to figure out better tests later but I am fine not bothering too much for now since the syntax and even the behaviour are not very weel defined for now.

Galileo-Galilei commented 11 months ago

I have create some documentation. I don't know where is the best place to put it since I think you want to keep the tutorial simple and default with the YAML base config.

I think this is a basic question that will be often raised, so it fits well in the tutorial.

I removed black and ruff as I run into issues in CI. pre-commit isn't agreeing with the Github Action. IMO we can just use pre-commit both locally and CI.

Agreed until we find a way to fix it.

noklam commented 11 months ago

I think this is a basic question that will be often raised, so it fits well in the tutorial.

Do you think the current structure works? It will be a separate document, so the 01_getting_started will just go through the DataFrameSchema approach, the DataFrameModel will be an optional one, maybe I can add a link in the 01_getting_started .

Galileo-Galilei commented 11 months ago

Yes I think it's fine to have them in two pages, but we should add an admonition (like a tip) and /or a link to the other page.

noklam commented 11 months ago

I have updated the documentation, I got to admit I am no expert of RTD and .rst file. Feel free to move stuff around, otherwise this is ready for merge.