UDST / orca_test

Data assertions for the Orca task orchestrator
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

Set defaults in test specs #12

Open semcogli opened 7 years ago

semcogli commented 7 years ago

I have converted my old test spreadsheet to ORCA test specs and it looks pretty verbose. A lot of repetitive items such as "numeric=True" and "registered = True" in ColumnSpec. Anyway to make them default so the spec list or YAML could come up much simpler and cleaner?

smmaurer commented 7 years ago

I think this might already be implemented, if i'm understanding correctly.

The more specific characteristics build on the more general ones -- for example, if you assert that a column has a minimum value, Orca_test will automatically check that it's numeric, can be generated, and is registered. So in the spec, you only need to list the characteristics that are independent of each other. This isn't in the documentation yet, but i'll add it.

Does this address the issue, or is there more we could be doing?

semcogli commented 7 years ago

Sorry for the late response. I looked at the code and indeed that's the way it is meant to be. But this test chaining only addresses part of my problem. For example, UrbanSim 1 requires no missing values in columns. If I imply the same logic in UrbanSim2, I need to have no missing test explicit to every data column, right? Same if I have most of my columns are numeric, why not just say "numeric=False" in the few remaining cases? It would be better to have a default setting for majority of the cases to simply the configs.

Thanks.

smmaurer commented 7 years ago

I see what you mean. It does seem useful to be able to define context-specific defaults.

Some thoughts about implementation:

One approach would be to write a set() method that would allow users to change an object's properties. This would come in handy in a variety of situations, and would look something like this:

col_defaults = ColumnSpec(numeric=True, missing=False)
t = TableSpec('buildings',
        col_defaults.set(column_name='building_id'),
        col_defaults.set(column_name='residential_price', min=0))

(The method would actually return a new object, to avoid problems with references.)

Or alternatively, we could modify the constructors to accept an existing spec that would serve as a base for the new object.

col_defaults = ColumnSpec(numeric=True, missing=False)
t = TableSpec('buildings',
        ColumnSpec('building_id', base=col_defaults),
        ColumnSpec('residential_price', min=0, base=col_defaults))

I can see both versions of the syntax being useful: the second seems clearer for setting default values, but the first is clearer if you want to modify an existing object. The functionality would be the same, so behind the scenes one could just point to the other.