ccao-data / model-sales-val

Heuristics for detecting outlier and non-arms-length sales
MIT License
2 stars 1 forks source link

Improve sales validation heuristics #19

Open dfsnow opened 1 year ago

dfsnow commented 1 year ago

Outline

So far, work in this repository as focused on functionalizing the sales validation code and building infrastructure to support it. Now, we need to revisit and improve the sales validation heuristics themselves. This will involve a lot of EDA to try to discover sales that are obviously non-arms-length, as well as the heuristic/statistical methods to flag them. We can also confer with Valuation analysts to develop heuristics.

Suggestions

I would start by looking at the count by outlier type of the current output. For outlier types that have a very low count, we should investigate why. It could be that there are genuinely very few non-arms-length sales of that type, but it is much more likely that we simply need to adjust the thresholds associated with the heuristic. Altering the regex for family/institutional flagging would be another easy place to start.

Some other suggestions:

We can also add brand new heuristics if we find any that are both appropriate and powerful enough.

Damonamajor commented 1 year ago

@dfsnow @wagnerlmichael @ccao-jardine

Chart & Descriptions

Flag SD Chart.xlsx

Attached is an excel file with two charts. It demonstrates the impact that reducing the standard deviations would have on our sale validation process. In particular, it sheds light on the hypothesis that the current heuristics are too conservative.

The first table counts the flags for each type based on different heuristics. The second table, (that I recommend reviewing), documents the difference from the original.

The columns are grouped in four categories:

Rows are ordered by decreasing count for our current heuristics.

Takeaways

dfsnow commented 1 year ago

This is excellent, thanks @Damonamajor. Please attach any other findings to this issue.

We're putting this on hold for the time being in order to get an export ready for sending to iasWorld. It's possible we will revisit this issue before the end of the year or early next year.

Damonamajor commented 1 year ago

Below is a link to OneDrive with two maps, and one excel file. The included README provides a brief description and sums up my takeaways.

Concluding Thoughts

@wagnerlmichael @ccao-jardine @dfsnow

dfsnow commented 1 year ago

Adding that we should include Question 9 from the PTAX-203 output to the possible PTAX flags.

dfsnow commented 7 months ago

@Damonamajor @wagnerlmichael Backburner for now, but I'd like to revisit this later in the year before modeling for 2025.