ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

Intersection proximity seems inaccurate for a number of labels #24

Open nch0w opened 5 years ago

nch0w commented 5 years ago

For example, label # 17088, which is at lat: 47.6554679870605, lng: -122.32576751709 The middleness I get from intersection-proximity is 90.00%, but the label is at an intersection:

Screen Shot 2019-07-09 at 12 55 19 PM
jonfroehlich commented 5 years ago

Can you provide a more quantitative assessment? Analyze, say, 100 different examples with screenshots.

nch0w commented 5 years ago

There is a bug with label-intersection-proximity. Sometimes it can classify a point that is close to the intersection as having a high middleness, or it can classify a point that is far from an intersection as having a low middleness, because the street segments used to compute intersection proximity don’t always correspond to streets. I tested 100 random labels, and out of these 77 were correct, 4 were classified as closer to an intersection than they actually were, and 19 were classified as farther from an intersection than they actually were.

Here is an example. The point (red dot) has a high middleness even though it is close to the intersection (two red dots) and the street extends further to the left Screen Shot 2019-07-09 at 5 52 16 PM

And this has a low middleness even though it is relatively far from the intersection.

Screen Shot 2019-07-09 at 6 04 38 PM

I assessed this by comparing the middleness of a label to whether or not the label seemed close to an intersection on the dashboard.

jonfroehlich commented 5 years ago

Capturing our Slack discussion about this. The tl;dr is that this is a known limitation in our approach related to how we incorporate OSM data and segment streets. CC @misaugstad @tongning.

@nchowder, when you get a chance, can you add in the full zip file of your analysis to this thread?

Jon Froehlich 5:56 PM @Neil Chowdhury thanks so much for investigating this and for posting a summary of your findings. Can you provide more details on what you tested, how are you evaluated, and the results were supporting data. Also, if there is indeed a bug, this affects any project that uses our deep learning model since it relies on this input. We also need to see if the Washington DC data is also affected as that’s the core of our assets paper

Neil Chowdhury 5:58 PM I have to leave soon but here are 100 screenshots of points + the endpoints of the street segments they identified.

Jon Froehlich 5:58 PM Please post more information to the relevant github issue ASAP Neil Chowdhury 6:07 PM Everything is at https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/24

Anthony Li 6:13 PM Hey @Neil Chowdhury, thanks for bringing this up. Unfortunately I think this is effectively expected behavior for the approach we are taking to calculate middleness, which is to assume that OSM segments correspond to blocks. This is not always true, but I believe we OK'd this approach in the interest of time.

Jon Froehlich 6:16 PM Hmm, I don’t remember discussing this. Can you print me to the discussion thread to refresh my memory. Sounds like a potential summer task to improve this. Perhaps @Neil Chowdhury can take this on? @antli, did you ever perform any systematic analysis of your street code performance that you could share?

Anthony Li 6:20 PM This was in our initial email thread in Feb. We decided to go with the street segment approach knowing that it wouldn't be perfect, but I don't think we did any formal performance analysis. In my testing I just hand-selected some points and checked that the results produced seemed reasonable

Mikey Saugstad 8:43 PM note that this will be worse in Seattle and Newberg than in DC. because in the newer cities we split street segments at neighorhood boundaries. in DC that didn't happen so a street could span multiple neighborhoods

nch0w commented 5 years ago

Zip file: https://drive.google.com/open?id=1rOC8sl0Dk5rk86LmwFoLhxXPZRC0CRJ_ So what should we do about the issue? I think it's limiting the accuracy of the label classifier.

jonfroehlich commented 5 years ago

I think you should try to solve it :-) Do you have access to the original code from Anthony?

Sent from my iPhone

On Jul 11, 2019, at 9:20 AM, Neil Chowdhury notifications@github.com wrote:

Zip file: https://drive.google.com/open?id=1rOC8sl0Dk5rk86LmwFoLhxXPZRC0CRJ_ So what should we do about the issue? I think it's limiting the accuracy of the label classifier.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

nch0w commented 5 years ago

I think it's just https://github.com/tongning/label-intersection-proximity

jonfroehlich commented 5 years ago

Yep.

On Thu, Jul 11, 2019 at 9:49 AM Neil Chowdhury notifications@github.com wrote:

I think it's just https://github.com/tongning/label-intersection-proximity

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/24?email_source=notifications&email_token=AAML55JZ6S7VBYLW247EW6DP65QBPA5CNFSM4H7IZ5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXJ4FY#issuecomment-510565911, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55P2VWUF5R3W762X74LP65QBPANCNFSM4H7IZ5JA .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

TODO: Create ground truth set to test intersection proximity algorithm

nch0w commented 5 years ago

There are some points where middleness is just not a reliable metric, even though absolute distance is. For example:

Screen Shot 2019-07-16 at 10 33 46 AM

The middleness is 40%, but the distance to the intersection is 8 meters.

jonfroehlich commented 5 years ago

Can you explain what you mean? You're basically saying that for really short streets, the % can be deceiving?

(I guess all metrics have positives/negatives and the hope is that they mostly work).

On Tue, Jul 16, 2019 at 10:36 AM Neil Chowdhury notifications@github.com wrote:

There are some points where middleness is just not a reliable metric, even though absolute distance is. For example: [image: Screen Shot 2019-07-16 at 10 33 46 AM] https://user-images.githubusercontent.com/17211794/61316258-724bcc00-a7b5-11e9-922a-f0ac9995ce29.png The middleness is 40%, but the distance to the intersection is 8 meters.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/24?email_source=notifications&email_token=AAML55IZ3D4MQBYG3724A5LP7YBIBA5CNFSM4H7IZ5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BS63Q#issuecomment-511913838, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55IHTO5MCE77D76TTV3P7YBIBANCNFSM4H7IZ5JA .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

Yes. If you saw that the middleness was 40%, you might think that the label is far from an intersection even though it is very close.

jonfroehlich commented 5 years ago

Right but I think generally it's still a useful metric :)

On Tue, Jul 16, 2019 at 11:54 AM Neil Chowdhury notifications@github.com wrote:

Yes. If you saw that the middleness was 40%, you might think that the label is far from an intersection even though it is very close.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/24?email_source=notifications&email_token=AAML55PMFQNQO57IYCPTIPDP7YKPFA5CNFSM4H7IZ5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BZYXA#issuecomment-511941724, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55IVCGAXF4UH25KAKKTP7YKPFANCNFSM4H7IZ5JA .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

jonfroehlich commented 5 years ago

Where did we leave this? Can you write up a report describing your findings and, if you have one, a solution.