ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

Classifying label accuracy using position #18

Open nch0w opened 5 years ago

nch0w commented 5 years ago

Labels were classified solely based on pitch, canvasX, canvasY, label type, and intersection proximity using a RandomForestClassifier, after undersampling so there were equal numbers of invalid and valid labels.

I filtered predictions with a confidence of >80%, so the classifier only makes predictions for 17% of the labels. 74% of the labels it classifies as correct are correct, and 83% of the labels it classifies as incorrect are incorrect.

When the confidence cutoff is lowered to >70%, the classifier makes predictions for 45% of the labels. 71% of the labels it classifies as correct are correct and 75% of the labels it classifies as incorrect are incorrect.

jonfroehlich commented 5 years ago

Hi @nchowder, thanks for adding this. I changed the title to try and provide more context on what you're doing. Is this title accurate?

In conversation, I also asked you to explain why you're doing this analysis (user group classification) and what it may be useful for... (at a high-level, it's interesting to think about 'user behavior' signatures that enable us to correctly infer their user group but I'm struggling to see what value this has).

nch0w commented 5 years ago

This is separate from the user group classification. It's just about predicting whether labels are correct based on their position

jonfroehlich commented 5 years ago

Can you properly update the title then?

Also, I'm surprised with the performance here given that we're not using very many features.

nch0w commented 5 years ago

I think it performs so well since we're only making predictions for a small proportion of the data (17%), so we're only finding the obviously correct/incorrect labels.

nch0w commented 5 years ago

Here is some additional analysis. I will need to investigate this further.

"Probability of predicting correct" represents confidence that the label is valid.

fig3 fig4

nch0w commented 5 years ago

Qualitatively, we are a lot better at predicting the correctness of Obstacle and SurfaceProblem labels than CurbRamp and NoCurbRamp. Screenshot_2019-07-05 label-position-analysis-neil Screenshot_2019-07-05 label-position-analysis-neil(1) Screenshot_2019-07-05 label-position-analysis-neil(2) Screenshot_2019-07-05 label-position-analysis-neil(3)

nch0w commented 5 years ago

With absolute counts:

Screenshot_2019-07-05 label-position-analysis-neil(5) Screenshot_2019-07-05 label-position-analysis-neil(4)

There is something wrong with the model. I will update the plots soon.

jonfroehlich commented 5 years ago

Sounds like it's cheating but let's discuss today (if possible).

On Mon, Jul 8, 2019 at 11:30 AM Neil Chowdhury notifications@github.com wrote:

We can classify whether or not a label is correct based on its proximity to other incorrect or correct labels (using K Nearest Neighbors). I'm not sure if this is cheating.

[image: Screenshot_2019-07-08 label-position-analysis-neil] https://user-images.githubusercontent.com/17211794/60833683-a4d14580-a173-11e9-9d86-a05f7831eebd.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/18?email_source=notifications&email_token=AAML55MTM6C2IP75DBVKOTDP6OBVTA5CNFSM4H47FD52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZN6VCI#issuecomment-509340297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55PXOLIEQ6V7ZDVJ553P6OBVTANCNFSM4H47FD5Q .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

Here are the radius classifier predictions. Valid labels are red and invalid labels are blue. The predictions are marked with x's, so a red 'x' indicates that the model is predicting that the label is valid, and a blue 'x' indicates that the model is predicting that the label is invalid. radius_classifier_predictions

jonfroehlich commented 5 years ago

Not sure how to interpret this. Can you summarizing your findings based on this analysis?

On Thu, Jul 11, 2019 at 9:54 AM Neil Chowdhury notifications@github.com wrote:

Here are the radius classifier predictions. Valid labels are red and invalid labels are blue. The predictions are marked with x's, so a red 'x' indicates that the model is predicting that the label is valid, and a blue 'x' indicates that the model is predicting that the label is invalid. [image: radius_classifier_predictions] https://user-images.githubusercontent.com/17211794/61069760-de978b80-a3c1-11e9-83d8-51d431299200.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/18?email_source=notifications&email_token=AAML55LPBIIYL26KYD33QU3P65QURA5CNFSM4H47FD52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXKKFY#issuecomment-510567703, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55IQM2AQVKTYFM457T3P65QURANCNFSM4H47FD5Q .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

I forgot to mention this earlier.

Another way of predicting the accuracy of a user is by predicting the correctness of each of their labels, and then dividing the number of predicted correct labels by the total number of labels.

image

I used a random forest classifier trained on the features in the x-axis of the above plot. I tested the model using K-Fold with 5 splits. The classifier trained on the labels of the training users and tested on the labels of the testing users.

The graph below shows the result of the classifier. Each point represents a user when they were in the testing group. The five colors correspond to each of the five splits. The equation is a linear regression of the machine's predicted accuracy vs. the user's actual accuracy, and the number to its right is the r-squared value.

Screenshot from 2019-08-02 09-47-12

nch0w commented 5 years ago

New plot of applying the label classifier to users (using the larger dataset from Mikey). Screenshot from 2019-08-08 11-36-30

jonfroehlich commented 5 years ago

Could you describe this plot please and summarize your findings

Sent from my iPhone

On Aug 8, 2019, at 11:39 AM, Neil Chowdhury notifications@github.com wrote:

New plot of applying the label classifier to users (using the larger dataset from Mikey).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nch0w commented 5 years ago

Yes, soon.

Recursive feature selection suggests that most of the features used are useful, I think. I used the guide here: https://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py

Screenshot from 2019-08-08 14-53-40

nch0w commented 5 years ago

The classifier (user is good/bad) has a precision of 87% and a recall of 86% on users with at least 27 validated labels.