CodeForPhilly / chime

COVID-19 Hospital Impact Model for Epidemics
https://codeforphilly.github.io/chime/
MIT License
205 stars 150 forks source link

["bug"] Refactor validators #503

Closed jlubken closed 4 years ago

jlubken commented 4 years ago

Refactor validators so there is less duplication with cli args.

Validators should add a cast so that int and floats are not conflated.

Validators should be descriptive so that they may be used to populate the argument parser directly.

mangalap123 commented 4 years ago

I started looking into this. I have question about this. Do you want to keep one of the below VALIDATORS = {....} and ARGS = (...) in parameters.py ??

BrianThomasRoss commented 4 years ago

I believe they both would be kept - I believe (although am open to being corrected) that by descriptive he means that the Validators should produce an object post validation and that those objects would/could populate the argument parser

inputs ---> validate ---> args

jlubken commented 4 years ago

The idea behind validators.py is to have an object per parameter that not only has a function to validate an input, but also describes the constraints, produce ValueErrors, and supply help text. The duplication here is between the min and max values of the ARGS tuple, and the min and max values in the validators. HELP is used by both.

A weakness in the validators is that they don't enforce the type like the ARGS do or know their own name for generating a ValueError, and they should. They should return the cast value to the appropriate type when called with an input value (cast int to float when a float is required). There is also circularity/complexity in the attempt to make validators reusable and a submodule. This adds cognitive overhead when data science simply wants to add another parameter to the model. See the definitions in validators/__init__.py There is some missing nested structure in the validators where needed, there is a validator for dispositions, but none for the dispositions array.

A weakness in the ARGS and of the config file is that they have no nested structure where that structure is needed. See how --hospitalized-days and --hospitalized-rate are combined to produce the desired item for the dispositions array. This discrepancy made simply replacing ARGS with a more complex VALIDATORS tuple of tuples break.

At this point, I'd simply replace the --arg style configuration file with yaml:

current_hospitalized: 83
dispositions:
    hospitalized:
        days: 7
        rate: 0.025
    icu:
        days: 10
        rate: 0.0175
...

and remove all command line arguments except for --parameters (also available as an environment variable PARAMETERS.

The critical missing piece here though, is that the cli must emit good messages for bad values and the config files values must be cast to the appropriate types. The validators must be improved to do this.

The parameters object is actually part of the data science model as well as part of "model" in MVC, a way to ensure that garbage wasn't being passed to the SimSirModel. It is also being used to provide the streamlit and webapps a single object that should be used to help make the input controls. So it is worthwhile to add the optional stepper increment (for streamlit) and anything dash needs.

To clarify:

  1. create initial parameters object using --parameters arg or PARAMETERS env from config file, applying validators
  2. create overridden parameters object with new values from the app, applying validators
  3. run model with overridden parameters

Step 2 is not needed for the cli.