Closed WesIngwersen closed 4 years ago
I think it makes sense to do these checks within the modules that produce them when possible. Since most of the sub-modules can be run individually (they contain an if __name__=="__main__"
), having that output be as correct as it can be would be ideal. That being said, I guess there's also a case to be made for an actual post-processing model that would look at the various main dataframes prior to turning into dictionaries. Once the data makes it into a dictionary, I think it'll be a much bigger pain to implement fixes.
From a manual fixes standpoint, I think a yaml file that has sections in it that correspond to the different .py files would be pretty easy to follow. And then the manual data could be formatted to match the output dataframe of that module.
OK let's move forward with trying to perform these fixes in the actual files where data are created, and just put common functions in the same place (new or existing .py)
Closing for now as handling this post-processing in the modules where data are created.
This is initially for discussion. @jump2conclusionsmatt @bl-young Does it make sense to create a single script to "manage" this postprocessing with functions that will be called from main.py after creating single or total model components, perform some checks, manually input data if manual data for various replacements is present (in whatever text based form is best).
Or perhaps these checks should be called from various scripts that create these components (like generator.py), at the earliest point in modeling building to avoid having to replace various dataframes and dictionaries after the fact?