ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Record doc numbers along with PINs when calculating comps #246

Closed jeancochrane closed 2 weeks ago

jeancochrane commented 2 weeks ago

This PR updates the comps calculation step in the interpret stage to save comp document numbers along with PINs. With document numbers, we can update downstream code that consumes these comps to uniquely tie each comp to a specific sale, instead of having to infer the sale based only on the comp's PIN.

I tested this locally on a subset of training/assessment data. The comps aren't great due to the limited size of the data, but it ran reasonably fast and confirmed that this processing code works. I'm happy to share the output of my test, or to kick off a remote comps run if that would make things easier to review. In the meantime, here's a quick peek at the output schema:

> final_comps <- cbind(comps[[1]], comps[[2]])
> final_comps %>% names
 [1] "pin"            "card"           "comp_pin_1"     "comp_pin_2"     "comp_pin_3"     "comp_pin_4"     "comp_pin_5"     "comp_pin_6"    
 [9] "comp_pin_7"     "comp_pin_8"     "comp_pin_9"     "comp_pin_10"    "comp_pin_11"    "comp_pin_12"    "comp_pin_13"    "comp_pin_14"   
[17] "comp_pin_15"    "comp_pin_16"    "comp_pin_17"    "comp_pin_18"    "comp_pin_19"    "comp_pin_20"    "comp_doc_no_1"  "comp_doc_no_2" 
[25] "comp_doc_no_3"  "comp_doc_no_4"  "comp_doc_no_5"  "comp_doc_no_6"  "comp_doc_no_7"  "comp_doc_no_8"  "comp_doc_no_9"  "comp_doc_no_10"
[33] "comp_doc_no_11" "comp_doc_no_12" "comp_doc_no_13" "comp_doc_no_14" "comp_doc_no_15" "comp_doc_no_16" "comp_doc_no_17" "comp_doc_no_18"
[41] "comp_doc_no_19" "comp_doc_no_20" "comp_score_1"   "comp_score_2"   "comp_score_3"   "comp_score_4"   "comp_score_5"   "comp_score_6"  
[49] "comp_score_7"   "comp_score_8"   "comp_score_9"   "comp_score_10"  "comp_score_11"  "comp_score_12"  "comp_score_13"  "comp_score_14" 
[57] "comp_score_15"  "comp_score_16"  "comp_score_17"  "comp_score_18"  "comp_score_19"  "comp_score_20" 

Closes https://github.com/ccao-data/pinval/issues/7.

jeancochrane commented 2 weeks ago

@dfsnow I finally got a completed run finished with run ID 2024-06-18-calm-nathan. Want to take a look before I merge?

dfsnow commented 2 weeks ago

@dfsnow I finally got a completed run finished with run ID 2024-06-18-calm-nathan. Want to take a look before I merge?

@jeancochrane I took a look and everything looks good. Let's merge it!