Open thomasaarholt opened 4 months ago
Hey @thomasaarholt, I would prefer pt.Field(allow_missing=True)
because if you say c: int = None
it's not entirely clear what is happening and also how can one still add a pt.Field with specific settings on a field.
A kwarg in pt.Field seems most clear and flexible to me.
And also I like the idea of allowing specific columns to be missing! :)
+1 for pt.Field(allow_missing=True)
+1 for allow_missing
. A related feature to consider is validation on derived_from
and constraints
column dependencies. We can inspect which columns are required to compute a derivation or constraint using expr.meta.root_names()
and check
allow_missing
is FalsePerhaps we could insert these checks into the Model.validate_schema
method.
I may be interested in giving this a go - are we happy to pursue pt.Field(allow_missing=True)
?
@dsgibbons yeah I think there is a consensus on allow_missing! Go ahead : )
Currently, a type specification of
Optional[int]
means that a column must be of integer type but may contain nulls.We currently don't support a syntax to specify that it is allowed that a column is missing.
One current workaround is to specify
Foo.validate(df, allow_missing_columns=True)
, whereallow_missing_columns
is passed on to_find_errors
as a kwarg (we should add this as an explicit parameter).The following example contains a suggestion for how we could allow missing columns (see
c
). It is one that @JakobGM came up with last year.An alternative would be to use
pt.Field
/ColumnInfo
, and do something like the following, which I might like better, just because it will pass type checks.I am very open to ideas here. Does anyone have a suggestion? Tagging a few possibly-interested parties, @brendancooley, @dsgibbons, @ion-elgreco