ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

Regression to predict accuracy #52

Open nch0w opened 5 years ago

nch0w commented 5 years ago

I started some analysis of using a regression to predict accuracy. I just used a BaggingRegressor with the same parameters as the BalancedBaggingClassifier we used to classify users. I used the same kinds of classifiers for the 2 RFE's.

I'm not sure what qualifies as a "good" regression in our case, maybe @jonfroehlich can chime in here. I put the R value on each of the plots, which show predicted accuracy vs. actual accuracy.

The 55 users with >45 validated labels: Screenshot_2019-08-27 sidewalk-user-quality-analysis-neil - Jupyter Notebook(1)

Users with >25 validated labels Screenshot_2019-08-27 sidewalk-user-quality-analysis-neil - Jupyter Notebook(2)

jonfroehlich commented 5 years ago

Oh, this is neat. Does the regression value also come with a confidence?

On Tue, Aug 27, 2019 at 8:48 PM Neil Chowdhury notifications@github.com wrote:

I started some analysis of using a regression to predict accuracy. I just used a BaggingRegressor with the same parameters as the BalancedBaggingClassifier we used to classify users. I used the same kinds of classifiers for the 2 RFE's.

I'm not sure what qualifies as a "good" regression in our case, maybe @jonfroehlich https://github.com/jonfroehlich can chime in here. I put the R value on each of the plots, which show predicted accuracy vs. actual accuracy.

The 55 users with >45 validated labels: [image: Screenshot_2019-08-27 sidewalk-user-quality-analysis-neil - Jupyter Notebook(1)] https://user-images.githubusercontent.com/17211794/63824368-b29d7e80-c90b-11e9-8124-b06d67bd1027.png

Users with >25 validated labels [image: Screenshot_2019-08-27 sidewalk-user-quality-analysis-neil - Jupyter Notebook(2)] https://user-images.githubusercontent.com/17211794/63824360-b03b2480-c90b-11e9-8c5a-56f0d1517e95.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/52?email_source=notifications&email_token=AAML55LNMKHHUV57OCCQWQTQGXYR5A5CNFSM4IQYFP72YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HHZ7ONQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55ITUV3Q72YBVNMRFXTQGXYR5ANCNFSM4IQYFP7Q .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

No, it doesn't. Only the random forest/bagging classifiers had that (in the form of predicting probabilities for each class).

jonfroehlich commented 5 years ago

Would love to see how well this performs when we get more user data...

On Tue, Aug 27, 2019 at 9:08 PM Neil Chowdhury notifications@github.com wrote:

No, it doesn't. Only the random forest/bagging classifiers had that (in the form of predicting probabilities for each class).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/52?email_source=notifications&email_token=AAML55PUXVUBOOQOQSDSNGLQGX22PA5CNFSM4IQYFP72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5J2NKA#issuecomment-525575848, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55JDZBIRJESLCT6HTBTQGX22PANCNFSM4IQYFP7Q .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io