Open dagsir[bot] opened 2 years ago
This was a question about whether dagster-pandera
supports something (it does). Solution here is to improve docs by linking the API doc from the guide and also emphasizing that either schema-defining approach is supported.
Summary
Update the Pandera guide to:
SchemaModel
andDataFrameSchema
)DataFrameSchema
approachDataFrameSchema
: https://pandera.readthedocs.io/en/stable/Issue from the Dagster Slack
This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1654667850385989?thread_ts=1654667850.385989&cid=C01U954MEER
Conversation excerpt
U03G3ND6C03: Hi all, I'm looking to use pandera to validate my SDA's. I'm looking to validate my raw data assets, which are straight dumps of the source data. However, there are spaces in the raw data field names, and I'm looking to use the dagster-pandera API which looks like below. Is there a way to overcome the spaces, preferably without changing the raw column names?
U015C9U9RLK: <@U018K0G2Y85> issue dagster-pandera doesn’t handle spaces in col names
U01GTMVMGQH: Hi Barry, dagster-pandera supports either of pandera’s formats for defining a dataframe schema-- the
SchemaModel
approach (which is illustrated in your snippet) and thepa.DataFrameSchema
approach. For columns with spaces, you should use thepa.DataFrameSchema
approach:See Pandera docs for more on the
DataFrameSchema
object.U03G3ND6C03: Ok sweet! So am I able to pass in the
member_schema
to my asset like so? It should work for either format of the schema?Message from the maintainers:
Do you care about this too? Give it a :thumbsup:. We factor engagement into prioritization.