Open dfsnow opened 1 year ago
@dfsnow @wagnerlmichael @ccao-jardine
Attached is an excel file with two charts. It demonstrates the impact that reducing the standard deviations would have on our sale validation process. In particular, it sheds light on the hypothesis that the current heuristics are too conservative.
The first table counts the flags for each type based on different heuristics. The second table, (that I recommend reviewing), documents the difference from the original.
The columns are grouped in four categories:
Rows are ordered by decreasing count for our current heuristics.
This is excellent, thanks @Damonamajor. Please attach any other findings to this issue.
We're putting this on hold for the time being in order to get an export ready for sending to iasWorld. It's possible we will revisit this issue before the end of the year or early next year.
Below is a link to OneDrive with two maps, and one excel file. The included README provides a brief description and sums up my takeaways.
@wagnerlmichael @ccao-jardine @dfsnow
Adding that we should include Question 9 from the PTAX-203 output to the possible PTAX flags.
@Damonamajor @wagnerlmichael Backburner for now, but I'd like to revisit this later in the year before modeling for 2025.
Outline
So far, work in this repository as focused on functionalizing the sales validation code and building infrastructure to support it. Now, we need to revisit and improve the sales validation heuristics themselves. This will involve a lot of EDA to try to discover sales that are obviously non-arms-length, as well as the heuristic/statistical methods to flag them. We can also confer with Valuation analysts to develop heuristics.
Suggestions
I would start by looking at the count by outlier type of the current output. For outlier types that have a very low count, we should investigate why. It could be that there are genuinely very few non-arms-length sales of that type, but it is much more likely that we simply need to adjust the thresholds associated with the heuristic. Altering the regex for family/institutional flagging would be another easy place to start.
Some other suggestions:
We can also add brand new heuristics if we find any that are both appropriate and powerful enough.