Code4HR / open-health-inspection-api

The API for the Open Health Inspection app
https://ohi-api.code4hr.org
GNU General Public License v2.0
7 stars 8 forks source link

Calculate a score for each establishment #18

Closed waldoj closed 10 years ago

waldoj commented 10 years ago

Virginia is unusual—though not extraordinarily so—in that it doesn't provide scores for establishments. There is just the list of minor and critical violations. The fact that no scores are provided is a real UX obstacle for people interfacing with data. Reading the inspection report for a single restaurant tells people nothing about what is normal, how that restaurant compares to other restaurants, or how that inspection is relative to earlier or later inspections of the same establishment. Also, Yelp cannot do anything with restaurant inspection data that doesn't include a score.

It seems to me that the solution is probably to calculate a score for each inspection. It's crucial that the metric for this score be simple and defensible. Perhaps there are guidelines within the regulations that could be relied on to establish a scale (e.g., "more than 6 critical violations must result in the establishment being closed until those violations are corrected"). Or perhaps it's best to simply rank all establishments in order of the number of violations. Or maybe the worst establishment is ranked at 0 and the best one at 100, and the rest are plotted between the two. Surely there are many other approaches that are worth considering. Scores should probably be calculated relative to other establishments in the same health department.

Ideally, this would be something that could be applied to other places where scores are not provided, solving the problem on a scale larger than within Virginia.

If there's any appetite for this proposed feature, I'd be glad to dedicate some of @opendata's resources towards helping to make it happen.

(I confess that I don't know whether this issue belongs in the API or within [https://github.com/c4hrva/open-health-inspection-scraper](the scraper).)

ttavenner commented 10 years ago

This is something that has been on the to-do list since the beginning, so there is certainly interest. I'd say the main roadblock preventing it from happening has been a lack of time in the face of more pressing needs. So we would welcome assistance in getting things moving.

In my research on the subject I was not able to find any specific guidelines regarding when a restaurant must be shut down. Virginia's guidelines in general leave much interpretation up to the inspector, so a basic numeric score may be the way to go.

waldoj commented 10 years ago

Do you have a preference as to whether this belongs within the API or within the scraper? I'm inclined to think the API (which is why I opened the issue here), but you're the expert!

ttavenner commented 10 years ago

Conversation is fine here. If any code is developed it might need its own repository.

waldoj commented 10 years ago

I just spent a while replicating your own research, looking at the relevant bits of state regulations, and inevitably coming to the same conclusion that you did—that there is no objective criteria discoverable in inspection reports for when a restaurant has its permit yanked.

So that leaves percentiles.

There are two levels of granularity that could be used: per establishment and per inspection. Perhaps the per-inspection level of granularity is the building block for per-establishment rankings.

Creating a percentile means assigning a numeric value to each type of violation (minor and critical), and perhaps weighting them based on whether they were corrected immediately and whether they were repeat violations. So imagine that a minor violation is assigned a value of 1.0, and a critical violation is assigned a value of 3.0. Perhaps immediate corrections have a multiplier of 0.75 and repeat violations have a multiplier of 1.5.

Looking at the April 7 inspection of a restaurant near my home, I see that there was one critical violation that was immediately corrected. So that's 3.0 ✕ 0.75 = 2.25.

Looking at the prior inspection of the prior establishment, I see nine violations. In order:

So that's a score of 14.625 for that inspection.

Presumably, these per-inspection scores can be aggregated to create a per-restaurant rating. I suspect that the value of those scores should decay over time, when combined to rate a restaurant. That is, a restaurant that received a score of 30 a year ago, a score of 15 six months ago, and a perfect score of 0 today should probably not just have their score set to 30 + 15 + 0. Instead, that 30 should decay to irrelevance over some period of time, so that the score forgets past transgressions.

Then we aggregate these restaurant scores into scores based on percentiles. Those would simply be the ranking of the restaurant's score relative to other restaurants (in the same health district?). The worst restaurant is the bottom 1%, the best one is the one that gets the closest to a perfect score. This ought to result in a normal distribution, with 68.2% of the results within two standard deviations of the fiftieth percentile. (It ought to, but it will quite certainly not, for reasons of the behavior of inspectors.) Translating this into grades would be quite unfair (60% of restaurants would receive a failing grade), but translating this into, say, a star-based system seems quite reasonable. The restaurants in the 40–60% range might be ranked at ★★★☆☆, which seems reasonable.

The thing that I want to avoid with this is judgment calls, because judgment calls are weak points. That said, I don't think there's any avoiding the need to assign numeric values to violations. We'll need to have a discussion about what those values, multipliers, and decay rates should be.

waldoj commented 10 years ago

@MatthewEierman, I just want to tag you in so that you're aware that this discussion is happening. I appreciate you have your own product that's based on a very complicated version of the very crude thing that we're doing here, so I appreciate that you might not want to actually participate.

ttavenner commented 10 years ago

I have a basic scoring system defined on a per inspection basis. It uses a 0 - 100 scale, starting at 100 and removing points based on number of violations, critical, repeat, etc. It also awards a small number of points for correcting the violation during the inspection. I'm still tweaking the values to get a reasonable distribution. What it could use is a good benchmark. I'd love to tie the score to something like restaurant closings, regional health, or even Yelp scores.

waldoj commented 10 years ago

Sounds interesting! When it's at a point when folks can give it some poking and prodding, let me know.

ttavenner commented 10 years ago

The scores have been calculated and are in the database. The api is returning the scores in results but I haven't yet implemented a method to search by score. Blaine is working on exposing the scores in the app this weekend. The scores should show up in the bulk file when it updates on Sunday. Here is the scoring algorithm at present:

A score is calculated for each inspection. The value starts at 100 and is then reduced based on the following criteria:

The total score for each inspection is stored in the database, then a total score for the vendor is calculated based on a weighted average of the most recent inspections:

MatthewEierman commented 10 years ago

We have our own way of doing it inspection scoring algorithms at HDScores that I won't get into, but it's commonly accepted by food safety professionals/academics that the accuracy of inspection data degrades at about 2% per month.

MatthewEierman commented 10 years ago

Also if you dig deep into the virginia inspection data, some of western virginia uses a different version of FDA food code (2005 verses 2009) then the rest do the state. As the base federal standard. Now there are only a few changes between each but still the rules are different.

waldoj commented 10 years ago

@ttavenner, to get a score of 0, wouldn't that require 20 repeat, critical violations, uncorrected during the inspection? You've seen far, far more inspection scores than I, but that seems implausible, reducing the functional range to something rather smaller than 100 points. Maybe, when the bulk files update on Sunday, we can graph the distribution of all of the scores, so that we can look at those scoring numbers in light of that?

the accuracy of inspection data degrades at about 2% per month. ... some of western virginia uses a different version of FDA food code (2005 verses 2009) then the rest do the state

Those are both really helpful pieces of information, Matthew—thanks so much!

ttavenner commented 10 years ago

I've actually got histograms from the process of developing the score.

Here is one for vendor scores: Histogram of Vendor Scores Here is one for inspection scores: Histogram of Inspection Scores and Here is one for number of violations: Histogram of Violations

It's true that getting a low score is pretty uncommon. There are only three vendors in the database with an overall score lower than 30. According to the database there are 436 vendors that have had at least one inspection with >= 20 violations. It's possible we could tweak the scoring or the degredation to shift the scores one way or the other. I'm not sure what the best benchmark is. I'd just hate to punish a restaurant by making them the baseline without a sound statistical reason.

ttavenner commented 10 years ago

Happy to keep this conversation going within this issue and continue making tweaks but for now I have added the source to calculate the scores to the scraper repo since the script will be run every week after the scraper runs.

ttavenner commented 10 years ago

Marking as closed due to open-health-inspection-scraper/commit/5d0dfb5

waldoj commented 10 years ago

I encourage you strongly to either open up this ticket again or open up a new one, and foster a wide-ranging discussion about this, with a lot of participants. To put it simply, you need to inoculate yourself against losing a lawsuit. A restaurant that gets a score of 61% and is angry about it (e.g., Hadeed v. Yelp) is going to sue CFA and they're going to sue you. To the extent to which you put together this scoring mechanism yourself, you run the risk of being held liable. (It also increases the likelihood that you've made some kind of a mistake, failed to factor in something important, etc. That is, legitimate grounds for legal liability.) Also, the more people who contribute, the greater the number of defendants, and the less likely that you'll be left hung out to dry.

Even if you've gotten this perfect, get yourself some statisticians, food industry experts, programmers, etc. to review your code and your logic, and have them do so in public, here on GitHub.

ttavenner commented 10 years ago

I'd prefer any further discussion about the technical aspects of the scoring to happen at the scraper repo rather than here since that is where the code lives. I also don't consider closing this ticket as an end to the discussion. Merely an acknowledgement that the initial issue "calculate a score for each establishment" has been completed and that there hasn't been any further followup to my previous comment 10 days ago.

As to liability, that is certainly a concern and I'd welcome any comments on the legal aspects of designing our own scoring system. We have provided a simple statement on the app and app repo to confirm that the scores are not provided by the Virginia Department of Health and are provided purely for informational purposes, but I'm sure there are other best practices we could apply. As to the referenced Hadeed vs. Yelp case, I don't think its applicable here. Hadeed's concern was that reviews falsely represented its service based on allegations that the reviewers were not actual customers. The entire case was based on bringing hidden information (i.e. the identity of the users) to light. In this case there is no hidden information, the data that the scores was based on as well as the scores themselves is all open for public inspection. The data is pulled directly from the Department of Health website. without alteration. These issue comments also serve as public record of the intent and methods behind the scores.

As to furthering a wide public discussion, I am and have always been open to that but I don't think Github issues are the place to do it. These issues exist to address technical aspects of the code and not the moral, ethical, or legal aspects. Nor should wider users be expected to sign up for a Github account to participate. We have attempted to open this discussion up as widely as possible, providing a feedback mechanism on the app and including conversations with other groups working on similar issues, local health inspectors, and even a few restaurant owners. We have had local news articles published on the app and provided press releases and publicized it on Twitter. We are doing what we can to keep this dialogue as open as possible and will continue to welcome any questions, comments, concerns, opinions, etc.

waldoj commented 10 years ago

Hey, whatever you like—I'm just making suggestions here. :)