ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 25 forks source link

Decide how users are graduated into using the new Validate UI #3585

Open misaugstad opened 5 months ago

misaugstad commented 5 months ago
Brief description of problem/feature

With the new Validate UI that lets users edit tags and severity, there is some concern about making it available to all users (I'm making it admin-only for now while I take my trip to Europe). The issue is that with this tool you actually get to edit the tags and severity (wiki- or osm-style) rather than just voting on whether or not you agree with the label.

If this is available to everyone, there are some situations that we likely wouldn't be happy with. For example, if an Admin adds a label on the Explore page and carefully sets the severity and tags, we don't necessarily want a totally untrained user to jump in and mess with the tags that we set. Same thing goes if an Admin takes the time to edit the tags/severity using this new UI, we don't want those changes undone so easily.

With sites like OSM or Wikipedia, there are way more users making edits and they have some tools for finding/detecting problematic data that we do not. And @yeisenberg has mentioned that this could have implications for gov't trust in the data that we provide.

Potential solution(s)

Some options:

  1. We could simply let everyone access this tool, and treat our data like OSM or Wikipedia treats theirs. The idea being that over time, the data will trend towards being more accurate.
  2. We could let anyone use the tool, but we could use our estimates of the expertise of users (i.e. the quality of their data) to inform what labels we serve to which users to validate. For example, maybe we wouldn't show labels from Admins to new users that are contributing for the first time. Maybe when choosing which labels to validate, we are more likely to show labels placed by users with 50% accuracy over users with 90% accuracy. We could expand our statistics on the quality of users' data and use those: maybe we calculate the agreement between their validations and overlapping validations from admins!

    I feel good about this idea as practically keeping the quality of our data high over time. The downside is the lack of hard rules, meaning a lack of explainability when people ask about the quality of our data. It would be hard to say something more than "we have an algorithm", and that's not all that reassuring.

  3. We could let users "unlock" this tool. This could be through a minimum amount of data being put into the system (labels and/or validations), a minimum level of accuracy, and/or any other metrics we want to add (could be some of the same ideas outlined in the previous section, like high agreement in validations overlapping with Admins). I think that this is the idea that we've talked about the most, so I'm sort of expecting us to go down this route (as much as I like route number 2 :wink:). Some stuff that would need to be figured out:
    1. What are our requirements to unlock the new tool
    2. Do we tell users that they can unlock it and how? If so, how do we want to advertise that to users throughout the site? And how do we show them their progress towards unlocking it? Do we tie this in with our system of badges, and add new ones?
    3. What do we do if users meet the thresholds to unlock the tool, but later on their performance dips below the threshold to unlock it? What if they dip way below that threshold?
misaugstad commented 2 months ago

@jonfroehlich and I discussed this a bit over Slack. Here are my takeaways:

I'm thinking we could just start super strict, requiring all of the following:

Maybe I can start by aiming for the metrics to grant access to our top 5-10% performing users? I can fiddle with the specific numbers to figure out how to best get to that number.

jonfroehlich commented 2 months ago

I don't think we have to be too strict... do we? If we make it too strict, not enough folks will be able to do anything with it. And we can always show a image + metadata multiple times to different users to protect quality.

misaugstad commented 2 months ago

I was thinking that we don't know where the bar should be, so we would want to start with the bar too high and then adjust it down as we're confident that it should be, rather than starting too low and realizing that we wished it were higher (given that we don't want to take away the UI after we've made it available).

And we can always show a image + metadata multiple times to different users to protect quality.

This is true! But even now we have more labels than validations (1 million labels, 700k validations), and validations will take longer than before with this UI.

But maybe I was being a bit too strict, we do ultimately have to trust our users at some point! Perhaps we establish a min miles + labels, and then it's the top 50% of performers (within that group) in terms of accuracy & labeling frequency?


One thing we talked about a bit before that we all liked was to look at the percentage agreement between a user's validations and the validations of admins. I think it may be worth factoring this in as well?