Add resolve_dataframe_model resolver

noklam commented 12 months ago

Description

Why was this PR created? Support the class-basedDataFrameModel schema which is supported by pandera.

Development notes

What have you changed, and how has this been tested?

Add resolve_dataframe_model

Questions:

I have create some documentation. I don't know where is the best place to put it since I think you want to keep the tutorial simple and default with the YAML base config.
I added a very simple test. Mocking a module is hard and I cannot think of a better way. The test itself isn't doing much. Do you think I should add more tests?
I removed black and ruff as I run into issues in CI. pre-commit isn't agreeing with the Github Action. IMO we can just use pre-commit both locally and CI.

How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

Checklist

[ ] Read the contributing guidelines
[ ] Open this PR as a 'Draft Pull Request' if it is work-in-progress
[ ] Update the documentation to reflect the code changes
[ ] Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
[ ] Add tests to cover your changes

Notice

[ ] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

noklam commented 11 months ago

i have some questions but are not related to the PR directly.

kedro pandera infer -d example_iris_data # current. Should we make dataset a must instead of options? kedro pandera infer example_iris_data should work.
How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.67% :tada:

Comparison is base (2f5d4bc) 88.54% compared to head (ebe3654) 89.21%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #32 +/- ## ========================================== + Coverage 88.54% 89.21% +0.67% ========================================== Files 5 5 Lines 96 102 +6 ========================================== + Hits 85 91 +6 Misses 11 11 ``` | [Files Changed](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9) | Coverage Δ | | |---|---|---| | [kedro\_pandera/framework/hooks/pandera\_hook.py](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9#diff-a2Vkcm9fcGFuZGVyYS9mcmFtZXdvcmsvaG9va3MvcGFuZGVyYV9ob29rLnB5) | `57.69% <ø> (ø)` | | | [kedro\_pandera/framework/config/resolvers.py](https://app.codecov.io/gh/Galileo-Galilei/kedro-pandera/pull/32?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Yolan+Honor%C3%A9-Roug%C3%A9#diff-a2Vkcm9fcGFuZGVyYS9mcmFtZXdvcmsvY29uZmlnL3Jlc29sdmVycy5weQ==) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Galileo-Galilei commented 11 months ago

kedro pandera infer -d example_iris_data # current. Should we make dataset a must instead of options? kedro pandera infer example_iris_data should work.

Good catch, I think you are right this should not be an option. That said I guess one day we could infer all the datasets of an entire pipeline and use a -p option, I am not sure if we will create another command. Not a big deal for now, feel free to change it if you want.

How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris

Yes. Testing is a pain in the a** because we need to mock an entire kedro repo with a context if we want to have things realistic. I guess a better test would be to check that the mock has called the importlib module function with assert_call_once method. We need to figure out better tests later but I am fine not bothering too much for now since the syntax and even the behaviour are not very weel defined for now.

Galileo-Galilei commented 11 months ago

I have create some documentation. I don't know where is the best place to put it since I think you want to keep the tutorial simple and default with the YAML base config.

I think this is a basic question that will be often raised, so it fits well in the tutorial.

I removed black and ruff as I run into issues in CI. pre-commit isn't agreeing with the Github Action. IMO we can just use pre-commit both locally and CI.

Agreed until we find a way to fix it.

noklam commented 11 months ago

I think this is a basic question that will be often raised, so it fits well in the tutorial.

Do you think the current structure works? It will be a separate document, so the 01_getting_started will just go through the DataFrameSchema approach, the DataFrameModel will be an optional one, maybe I can add a link in the 01_getting_started .

Galileo-Galilei commented 11 months ago

Yes I think it's fine to have them in two pages, but we should add an admonition (like a tip) and /or a link to the other page.

noklam commented 11 months ago

I have updated the documentation, I got to admit I am no expert of RTD and .rst file. Feel free to move stuff around, otherwise this is ready for merge.

Galileo-Galilei / kedro-pandera