Open jonfroehlich opened 5 years ago
This is probably one of the highest priority analyses I'd like to see completed.
So I did an initial analysis of this. This is running an SVM on features extracted from the first n panos visited by each user, with recursive feature elimination. Users who had visited fewer than the n panos were filtered out of analysis.
It looks like recall decreases as the number of panos increases, but the precision stays about the same.
*the first point is n=5 panos
This is the distribution of how many users have seen a certain amount of panos
'10+ panos: 342 users' '25+ panos: 264 users' '50+ panos: 188 users' '100+ panos: 91 users' '200+ panos: 47 users' '300+ panos: 35 users' '400+ panos: 33 users' '500+ panos: 32 users'
A rather large but interesting and important analysis is: how much interaction log data do we need per user to accurately infer whether they are a "good" or "bad" user?
Roughly, the way to do this is to graph prediction accuracy as a function of amount of data. So, how well does our model predict user quality after the tutorial, after one mission, after two missions, after three, etc.