ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 24 forks source link

Increase audit priority for streets with no labels #2075

Open misaugstad opened 4 years ago

misaugstad commented 4 years ago

One thing we worry about when thinking about the quality of our data is the prevalence of false negatives (the number of missed labels); false positives are not as big of a deal, because human and/or computer validation can rule them out easily enough. As such, I think we should slightly increase the audit priority for streets that have no labels associated with them.

What we do right now is to score users based off of their accuracy (based on validations) and labeling frequency. Then we determine audit priority for streets entirely from how many times they were audited (and whether or not they were audited by users with a high accuracy). I think this is really good and gets us most of the way there. However, high quality auditors can still make mistakes. It's also possible that GSV could route them in a weird way that made it hard for them to label or confused them. Maybe an auditor who isn't actually that good slipped through our system and was marked as "high quality" even though they were skipping some streets. Maybe there is a generally good auditor, but they don't remember to place "No Sidewalk" labels, so there are no labels for some streets.

Anyway, I think there are a lot of reasons for a street to show up with no labels, and false negatives are something we really want to avoid. Which makes me think that we should slightly adjust the algorithm to add some more weight to streets with no labels (assuming this doesn't take TOO much time).

misaugstad commented 3 years ago

This should probably be generalized a little bit to maybe computing a "labels per meter" stat for each street and weighting streets with fewer labels per meter higher.

jonfroehlich commented 3 years ago

Just came here to say that you are essentially replying to yourself nearly one year later! :D

Love these long-era self-communications.

misaugstad commented 1 year ago

There are a few edge cases to take into account here.

  1. Based on both the structure of OSM data and the byproduct of splitting our road network up based on neighborhoods, there can sometimes be very short segments of streets where neither endpoint is located at an intersection. We shouldn't be surprised if there are few to no labels on such street segments. And it would make the Explore experience worse if increased priority for those streets and we started jumping users to tiny pieces of road that are in the middle of the street bc there are no labels there.
  2. On the flip side, there are some very long roads that have high quality sidewalks, and therefore you would not expect very many labels to be there. If we try to prioritize streets on labeling frequency, these streets would have their priority increased too much, and users would be routed down very long roads unnecessarily. This just increases users' workload without any benefits to data quality.

We can probably deal with the first point by just excluding streets that are very short.

We'll have to think about a formula that doesn't cause an issue with point 2. Maybe the whole goal of this issue is to deal with streets that have almost no labels, and so we just increase priority for streets with fewer than 4 (that num is arbitrary) labels, excluding very short streets.

misaugstad commented 2 months ago

And I think that streets should only have their priority if only one "high quality" user has audited the street. Once a second high quality user has audited it, we don't have to increase priority anymore. OR we can increase priority less drastically. I was imagining that if one high quality user has audited the street but there are very few labels, we set priority to 1, treating it as if it hasn't been audited before. Once a second set of eyes agrees that there shouldn't be labels on the street we don't need to keep showing it to people