Closed sd2k closed 4 years ago
Instead of getting rid of the type information, I think we could just parameterize it. E.g. add a dagster_type and parameter_name parameter to ge_validation_solid_factory
. Re: batch_kwargs I'm not sure what the differences are on the GE side, but I would guess the right place to plop "extra_batch_kwargs" args would be as a parameter to the solid factory not in config
Hmm, yes it looks like my diff was pretty far from what I intended :smile: - I did include the dagster_type
(as input_dagster_type
) as a parameter to ge_validation_solid_factory
, but forgot to add it to the InputDefinition
. And roger that re. the extra_batch_kwargs
.
Currently the solid returned by
ge_validation_solid_factory
only works for pandas DataFrames. It should be fairly easy to add a few extra parameters to the factory so that the input type can be customised. It would also be good if thebatch_kwargs
could be customised, since GE's SparkDataFrame often needs additional options passing (e.g.reader_options
).Happy to submit a PR for review if work hasn't already been done here. The diff I had in mind was something like:
I think this would work for Spark DataFrames, but probably not for other datasources which don't expect a
dataset
batch_kwarg; those would need to be handled differently somehow.