ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

DC interaction data analysis #47

Closed daotyl000 closed 4 years ago

daotyl000 commented 5 years ago

Investigate the user dc data and see if we're able to create a machine learning model based off of the dc interaction data. There are 132 users who have done missions on streets that have been labeled by researchers so they are considered ground truth.

daotyl000 commented 5 years ago

Out of the 132 users, only 80 users have interaction data that I was able to use to calculate features (52 users would produce 0's for most features).

When graphed the features we were able to create from the dc data, most of their correlations were below 0.1 and our classifier was only able to get as high as a 64% accuracy when trying to determine if a user had at least a 65% precision score. None of the features, when measured against precision, had a strong correlation.

When measuring features against f1 score, both of the heading features and average pitch were the only features with a good correlation (0.3), but the model was less accurate (high was ~57% accurate).

We are unsure why the model for precision was more accurate when the correlation graphs for f1 score was more promising

jonfroehlich commented 4 years ago

Closing this because we're not going to use DC data going forward (infrastructure has just changed too much since then).