Open galenweld opened 5 years ago
I've been thinking about this and one challenge, as you said, is that we have false negatives in our dataset. And sometimes this is by design--that is, a user labels some features in one pano and then takes a step to label the rest of the features. So, while these features are visible in both panos, the labels are split--and this is fine for the purpose of utilizing the labels to geo-locate problems but not great if we need each pano to be exhaustively labeled (i.e., all features visible are labeled correctly).
One idea moving forward might be for you to create a rapid labeling/validation tool of some sort to improve our label dataset (though this would take time both to create the tool and to go through the panos--so not sure we want to open that Pandora's box).
I completely agree, in the long term, I think the false negatives is going to be one of the biggest challenges we face. A labeling tool would definitely help, but I think it will be extremely labor intensive to go back and label the missing features; as I suspect there are an enormous number of them. I think a simpler strategy may just to limit ourselves to a certain range of depths. I'm working on the depth heatmap right now, will send that out shortly.
Agreed.
On Tue, Feb 5, 2019 at 9:03 AM Galen Weld notifications@github.com wrote:
I completely agree, in the long term, I think the false negatives is going to be one of the biggest challenges we face. A labeling tool would definitely help, but I think it will be extremely labor intensive to go back and label the missing features; as I suspect there are an enormous number of them. I think a simpler strategy may just to limit ourselves to a certain range of depths. I'm working on the depth heatmap right now, will send that out shortly.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/galenweld/project_sidewalk_ml/issues/8#issuecomment-460718332, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi-9cxqm8aTJJmd1WZCf26C3u3oyyWwks5vKblCgaJpZM4ae5ug .
-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://www.cs.umd.edu/~jonf/ http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter
@galenweld, does your new training approach sort of address this--if so, we should close this ticket. If not, can you describe more about how "hard negative mining" would work?
Hi Jon, The new dataset doesn't really address this issue.
The motivation for hard negative mining is: Training examples of negative crops (ie crops containing none of curb ramps, missing ramps etc) are most valuable to the system when they look similar to positives. For example, the system might not need very examples of blue sky to learn that sky is not a curb ramp, but might need lots of examples of driveways to learn that driveways aren't curb cuts, as driveways look far more similar to curb cuts than blue sky.
One strategy to deal with this is to have the system focus more on (ie by iterating more over, or by training on more examples of) the negative examples that it gets wrong the most - that is what is meant by hard negative mining.
However, given the noisiness of the Project Sidewalk data (with a good number of missing labels), I suspect that this approach will not improve our performance without a significant investment of labor cleaning up the dataset. If the dataset has missing labels (which we know it does) then training on the negative examples which it gets wrong is likely to just have it train on the crops around curb ramps (and other objects) which were missed in the labeling process.
Does that make sense? If not, please don't hesitate to ask for clarification.
As such, I'm inclined to think we have bigger fish to fry - so this issue is a low priority.
Ok!
Sent from my iPhone
On Mar 22, 2019, at 6:35 PM, Galen Weld notifications@github.com wrote:
Hi Jon, The new dataset doesn't really address this issue.
The motivation for hard negative mining is: Training examples of negative crops (ie crops containing none of curb ramps, missing ramps etc) are most valuable to the system when they look similar to positives. For example, the system might not need very examples of blue sky to learn that sky is not a curb ramp, but might need lots of examples of driveways to learn that driveways aren't curb cuts, as driveways look far more similar to curb cuts than blue sky.
One strategy to deal with this is to have the system focus more on (ie by iterating more over, or by training on more examples of) the negative examples that it gets wrong the most - that is what is meant by hard negative mining.
However, given the noisiness of the Project Sidewalk data (with a good number of missing labels), I suspect that this approach will not improve our performance without a significant investment of labor cleaning up the dataset. If the dataset has missing labels (which we know it does) then training on the negative examples which it gets wrong is likely to just have it train on the crops around curb ramps (and other objects) which were missed in the labeling process.
Does that make sense? If not, please don't hesitate to ask for clarification.
As such, I'm inclined to think we have bigger fish to fry - so this issue is a low priority.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@galenweld, is this still something worth exploring? Perhaps this summer?
Definitely, and this conversation should be incorporated into our `online learning' discussion about continually improving the model based on updated labels as they are placed by crowd workers!
I had an excellent meeting with Joe Redmon yesterday, and we discussed several possible improvements to the system. This is one.
Hard negative mining capitalizes upon the idea that the most useful negative examples to the CV object detection system are the ones that are quite similar to the positives - the "hard" examples. In our context, for example, this means we don't need many examples of plain road surface in our negative category, but want lots of other sidewalk looking stuff.
To implement this, one technique would be to supplement (or replace) our randomly generated null crops with just the random negative crops that system consistently gets wrong.
The challenge here is that our ground truth data is missing a decent number of labels. This issue in-and-of-itself should be investigated some more (#7) , but the problem is that if we have a curb ramp that's missing a ground truth label, the system will get that correct, but since there's no label there, it'll be marked as wrong, and added to the hard negative set to be trained on some more, essentially training the system to ignore that curb ramp, and undermining our performance.