ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

Analyze population density as a predictor of user accuracy #33

Open jonfroehlich opened 4 years ago

jonfroehlich commented 4 years ago

We can get population density information for Seattle from Raymond Fok (or, at least, he can point us in the right direction).

This relates to https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/22

daotyl000 commented 4 years ago

Here is the arcgis map of the labels of all users with an accuracy lower than 65%. The "population" scale on the side is really the density. It is measured in popuplation per square mile. The industrial district does not have any population density data which I assume means no one lives in that region of Seattle. Generally, it appears as the lower the population density, the lower the accuracy which I interperate as there are less sidewalks due less people living there.

Screen Shot 2019-08-01 at 2 55 00 PM Screen Shot 2019-08-01 at 2 55 12 PM

jonfroehlich commented 4 years ago

Can you do a quantitative analysis rather than a qualitative analysis?

On Thu, Aug 1, 2019 at 3:14 PM daotyl000 notifications@github.com wrote:

Here is the arcgis map of the labels of all users with an accuracy lower than 65%. The "population" scale on the side is really the density. It is measured in popuplation per square mile. The industrial district does not have any population density data which I assume means no one lives in that region of Seattle. Generally, it appears as the lower the population density, the lower the accuracy which I interperate as there are less sidewalks due less people living there.

[image: Screen Shot 2019-08-01 at 2 55 00 PM] https://user-images.githubusercontent.com/28814007/62330575-e781f700-b46d-11e9-949b-0b207297e329.png [image: Screen Shot 2019-08-01 at 2 55 12 PM] https://user-images.githubusercontent.com/28814007/62330576-e81a8d80-b46d-11e9-8ca6-29ce6ca06345.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/33?email_source=notifications&email_token=AAML55KGA2DP5NJBNO5SZ4DQCNN2VA5CNFSM4IILNSO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MBHLY#issuecomment-517477295, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55LZ7TKITQW4E6LXNV3QCNN2VANCNFSM4IILNSOQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

daotyl000 commented 4 years ago

Here is my analysis: I estimatee the percentages by zooming into each individual neighborhoods. I changed the color scale to be within 4 distinct color groups instead of gradients. The bottom three groups have a difference of about 5000 people/square mile.

Screen Shot 2019-08-01 at 4 32 04 PM

My analysis shows that was the population density increases, so does user accuracy

The white region consists of only the Industrial District and is mainly incorrect labels

Tannish Region: Rainier Beach: about 80% correct Harbor Island : 50% in correctness Fauntleroy: 50% in correctness South Park: 50% correctness Briar Cliff 50% correctness Mid-Beacon Hill: 20% correct South Delridge: 10% correct

Neighborhoods: 7
Average: 44.29%

Light Blue Region: Pinehurst 90 % correct Gatewood 75% correct Sunset Hill 75% correct West Queen Anne 75% correct Harrison/Denny-Blaine 70% correct North Beach/ Blue Ridge 50% correct Wedgewood 30% correct Alki: 10% correct

Neighborhoods: 8
Average: 59.375%

Blue Region: Loyal Heights 90% correct Roxhill 90% correct East Queen Anne 90% correct University District 70% correct North Queen Anne 70% correct South Lake Union 60% correct Maple Leaf: 50% correct Fremond: 50% correct Olympic Hills: 20% correct

Neightborhoods: 9
Average: 65.56 %

Dark Blue Region: Whittier Heights 90% correct Mann 90% correct Broadway 80% correct

Neighborhoods: 3
Average: 86.67%
jonfroehlich commented 4 years ago

Thanks. I originally intended for you to do this programmatically. If we think that there is some merit to this, then you could programmatically calculate population density as a predictor of user accuracy. Before doing so, perhaps you should think about the best way to do this and then propose a plan.

On Thu, Aug 1, 2019 at 4:39 PM daotyl000 notifications@github.com wrote:

Here is my analysis: I estimatee the percentages by zooming into each individual neighborhoods. I changed the color scale to be within 4 distinct color groups instead of gradients. The bottom three groups have a difference of about 5000 people/square mile.

[image: Screen Shot 2019-08-01 at 4 32 04 PM] https://user-images.githubusercontent.com/28814007/62333812-ece53e80-b479-11e9-95c1-71a00a9ef5c8.png

My analysis shows that was the population density increases, so does user accuracy

The white region consists of only the Industrial District and is mainly incorrect labels

Tannish Region: Rainier Beach: about 80% correct Harbor Island : 50% in correctness Fauntleroy: 50% in correctness South Park: 50% correctness Briar Cliff 50% correctness Mid-Beacon Hill: 20% correct South Delridge: 10% correct

Neighborhoods: 7 Average: 44.29%

Light Blue Region: Pinehurst 90 % correct Gatewood 75% correct Sunset Hill 75% correct West Queen Anne 75% correct Harrison/Denny-Blaine 70% correct North Beach/ Blue Ridge 50% correct Wedgewood 30% correct Alki: 10% correct

Neighborhoods: 8 Average: 59.375%

Blue Region: Loyal Heights 90% correct Roxhill 90% correct East Queen Anne 90% correct University District 70% correct North Queen Anne 70% correct South Lake Union 60% correct Maple Leaf: 50% correct Fremond: 50% correct Olympic Hills: 20% correct

Neightborhoods: 9 Average: 65.56 %

Dark Blue Region: Whittier Heights 90% correct Mann 90% correct Broadway 80% correct

Neighborhoods: 3 Average: 86.67%

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/33?email_source=notifications&email_token=AAML55IYKWRF3NZICPVJNNLQCNX2NA5CNFSM4IILNSO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MFRXI#issuecomment-517495005, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55J5RESNOQJ3CCABQXTQCNX2NANCNFSM4IILNSOQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 4 years ago

Here are some plots of population density vs. label accuracy. In general, the higher the population density, the higher the accuracy.

Screenshot from 2019-08-02 13-00-49

Screenshot from 2019-08-02 13-01-02

daotyl000 commented 4 years ago

I'm not sure how viable the obstacle vs population graph is because it looks like that one outlier is the only reason the trend line isn't straight.