ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Refactor `export` stage to use a config dict representing the workbook structure #221

Open jeancochrane opened 4 months ago

jeancochrane commented 4 months ago

When you add or remove a column from the desk review workbook template used by the export pipeline stage (misc/desk_review_template.xlsx), you currently have to adjust all the individual references to integer and/or letter column positions for columns that come after your new column(s) in pipeline/export.R, as well as tweak the code to add any necessary formatting to the column. This is annoying and error-prone.

I think a better design would be refactor the pipeline stage to use a centralized data structure like a dictionary of dictionaries to read all of the position and formatting metadata about every column, so that adding or removing a column only requires updating the one data structure. I'm imagining something like:

workbook_schema <- list(
  pin_detail_sheet = list(
    pin = list(formula = true),
    ...
    land_rate_sf = list(style = style_comma),
    ...
  ),
  card_detail_sheet = list( ... ),
  comparables_sheet = list( ... )
)

Using this kind of data structure, the position of each column could be determined with a which() call and the column's name, and the code could be updated to iterate the list of columns and apply formatting appropriately.