Closed noklam closed 11 months ago
i have some questions but are not related to the PR directly.
kedro pandera infer -d example_iris_data # current
. Should we make dataset
a must instead of options? kedro pandera infer example_iris_data
should work.
How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris
Patch coverage: 100.00%
and project coverage change: +0.67%
:tada:
Comparison is base (
2f5d4bc
) 88.54% compared to head (ebe3654
) 89.21%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
kedro pandera infer -d example_iris_data # current. Should we make dataset a must instead of options? kedro pandera infer example_iris_data should work.
Good catch, I think you are right this should not be an option. That said I guess one day we could infer all the datasets of an entire pipeline and use a -p
option, I am not sure if we will create another command. Not a big deal for now, feel free to change it if you want.
How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris
Yes. Testing is a pain in the a** because we need to mock an entire kedro repo with a context if we want to have things realistic. I guess a better test would be to check that the mock has called the importlib module function with assert_call_once
method. We need to figure out better tests later but I am fine not bothering too much for now since the syntax and even the behaviour are not very weel defined for now.
I have create some documentation. I don't know where is the best place to put it since I think you want to keep the tutorial simple and default with the YAML base config.
I think this is a basic question that will be often raised, so it fits well in the tutorial.
I removed black and ruff as I run into issues in CI. pre-commit isn't agreeing with the Github Action. IMO we can just use pre-commit both locally and CI.
Agreed until we find a way to fix it.
I think this is a basic question that will be often raised, so it fits well in the tutorial.
Do you think the current structure works? It will be a separate document, so the 01_getting_started
will just go through the DataFrameSchema
approach, the DataFrameModel
will be an optional one, maybe I can add a link in the 01_getting_started
.
Yes I think it's fine to have them in two pages, but we should add an admonition (like a tip
) and /or a link to the other page.
I have updated the documentation, I got to admit I am no expert of RTD and .rst file. Feel free to move stuff around, otherwise this is ready for merge.
Description
Why was this PR created? Support the class-based
DataFrameModel
schema which is supported bypandera
.Development notes
What have you changed, and how has this been tested?
resolve_dataframe_model
Questions:
black
andruff
as I run into issues in CI.pre-commit
isn't agreeing with the Github Action. IMO we can just use pre-commit both locally and CI.How did you test your change? Did you have a local repository have everything there? I test it with an example repo currently: https://github.com/noklam/kedro-pandera-iris
Checklist
CHANGELOG.md
file. Please respect Keep a Changelog guidelines.Notice
[ ] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.