Open Nitnelav opened 1 year ago
snakemake
to see if it might be good to switch to a pipeline tool with a large user base. Would be interesting to see if there is an integration that can check the format.schemas.create_persons(additional = "income").validate(df_persons)
, with some standard attributes that need to be there plus optional ones if neededO_o snakemake looks quite interesting indeed ! joining a broader "pipeline" community would make a lot of sense.
regarding the 2nd point I think I would prefer defining everything inside the script but I see how that might lead to a certain amount of code duplication (if df_persons structure doesn't change much across many scripts for exemple...).
FYI, I'm using pandera right now in another pipeline, and I find it very verbose if you want to validate the whole dataframe at every stage... I'll have a better opinion in a few weeks
I think it would be a good idea to use Pandera to describe and check the input dataframes of a given stage at runtime.
It has the benefit of :
I don't think it can or should be be imposed in every existing stage but it can be strongly encouraged by the community.
For exemple :