Impacts and Handling of False Negatives

galenweld commented 5 years ago

I've spent some time sifting through a few dozen panos to try and get a qualitative handle on the false negative challenge, which has been discussed a bit in #8 and #6, but I think is a significant enough concern that it warrants its own issue here.

On the whole, missing labels are much more common for features that are at distant intersections or further from the camera, while the nearby features are better labeled.

However, there is a good bit of variability by feature type. For curb ramps, we seem to do a decent job on nearby features, and almost entirely lack labels for ones down the block or further away. There is the issue, on the other hand, where in many places we seem to have two labels for the same curb ramp - presumably because multiple users have labeled it. This should be easy enough to remove with some preprocessing.

Missing ramps are more difficult, as we're trying to label something that not there. For wide crosswalks, if there's a curb ramp at the edge of the crosswalk, instead of in the middle, oftentimes the system automatically labels a missing ramp in the middle of the crosswalk, as it seems to have learned that ramps should be in the middle of the crosswalk. The 'correct' behavior isn't well-defined here.

For obstructions, similarly, there's variability in where to place the label for large obstructions. I noticed one instance where the system labeled a post box on the top of the box, which was marked as incorrect as the human had labeled the bottom of the box.

obstruction

For surface problems, there's a similar, if not more challenging, concern. In many cases, surface problem labels are used to mark missing sidewalks. In this case, is a single label adequate for an entire section of missing sidewalk? In an ideal world, would we like our system to label the entire section of missing sidewalk as a surface problem?

missing_sidewalk

In the end, I'm curious if it's worth the time to go through and 'grade' a set of predictions by hand, to see how much the manual grading differs from the automatic grading using crowd-sourced labels. We could recompute precision and recall as well.

jonfroehlich commented 5 years ago

@galenweld, now that some time has passed, what are your thoughts on this? Is there something actionable here that you, Devesh, or another intern could attack?

galenweld commented 5 years ago

I think this was pretty darn well addressed by our ground truth dataset that @infrared0 and I produced, so I'm going to go ahead and close it. The primary point I believe I was trying to address was that our labels we were evaluating our performance on were imperfect, and that's why we decided to make a ground truth dataset.

I do mention the challenges of the missing curb ramp label type in particular, and I still believe that's a really interesting question, one that I'm also excited about the prospect of the Siamese net (#10 ) improving upon, hence why I had been proposing that in the first place -- it's the method used in the David Jacobs "Seeing What is Not There" paper which focuses just on missing curb ramps.

ProjectSidewalk / sidewalk-cv-assets19

Impacts and Handling of False Negatives #11