Open misaugstad opened 6 years ago
Just want to add the rationale behind the user agreement analysis here: high agreement is meant as a proxy for quality of data (since we know that higher agreement means higher precision). We can also use the characteristics of the streets with high/low agreement to characterize the difficulty of certain types of routes.
Here is the algorithm that @manaswis and I came up with, as written by @manaswis on Slack:
Although I have an idea for how we can simplify the algorithm, while still feeling confident in the results. Instead of counting the number of users who placed a label on that street, I think we can just take the number of users who audited that street, but only because we are looking at the set of "good" users (i.e., those with a high labeling frequency). Looking at users who had placed a label on that particular street was mostly meant to find users who acutally audited the street; I think this is covered when we are looking at our set of "good" users.
@manaswis how do you feel about that modification and its justification?