ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 24 forks source link

Ideas for Reviewing/Verifying Labels (Quality Control) #535

Closed jonfroehlich closed 5 years ago

jonfroehlich commented 7 years ago

We talked about this in our UIST'14 paper but I wonder if it would be worth exploring again.

Imagine having a review data tab on Project Sidewalk that shows a grid of cropped images with their labels. The interface would allow the user to select between showing different label types. You could then quickly verify or correct mislabeled items (maybe vote up or down on whether you agree)--kind of like this Picasa interface:

image

We could also show a subset of this interface in between labeling tasks to break up redundancy of auditing.

A few benefits:

  1. We begin to get verifications for data.
  2. Users get to see what others consider problems, etc.
manaswisaha commented 7 years ago

I like this idea esp. showing them in between audits.

misaugstad commented 7 years ago

In issue #1076 @jonfroehlich said:

We have been discussing quality control methods. There are a ton of possibilities here, including:

  • Verification interfaces
  • Analyzing behavior of worker via interaction logs
  • Performing statistical verification of labeling activity. That is, does a labeler's mission have similar label distribution of prior routes in that (i) neighborhood or (ii) neighborhood land use type. This seems relatively easy.

We could perform offline investigations for some of these to investigate further.

For ideas, see Quinn and Bederson's overview of human computation (link). There are lots of research papers on quality control methods for online work that we should examine for ideas as well.

misaugstad commented 7 years ago

Another verification tool idea that we have all discussed before is a Tinder-style app/page where it shows you one image with a label on it at a time, and you do a quick up/down vote. We have talked about this as a great mini-tool that people can use while sitting in a waiting room, on the elevator, etc.

We are currently trying to get a mobile version of the tool working (see #282 ), but we have also talked about this as just replacing the current auditing tool for mobile users, if the full tool turns out to be too complex for a cell phone form factor.

misaugstad commented 7 years ago

Another thing that @jonfroehlich mentioned in a meeting we had was an admin tool where we subsample some number of crowd workers and manually review their work by walking through the routes or reviewing a subset of their labels (in particular, by someone who has read through the labeling codebook (see: #961 ).

@jonfroehlich my notes on that meeting are incomplete, and I wasn't quite sure what you meant at the time. If what I just said above reminds you of what you were talking about, could you elaborate a bit? (this meeting was ~2 weeks ago, so I would not expect you to necessarily remember)

aileenzeng commented 6 years ago

Did some preliminary mockups:

62733390-e360-4f98-aa8b-6be9740e5942 4c4bc4a0-d391-4116-ae72-f34dddd3be43

I personally like the designs that show fewer pictures at a time. I think they give users more context for what’s going on in the scene, and it might simplify any keyboard controls that we might add later.

I also want to have a feature that allows us to show common examples (picture + short description) of each label type. Maybe the labeling guide is extensive, but I think having a quick reference users can check is important so they can be more sure if they’re classifying labels correctly. We could also link to the Labeling Guide if they want more extensive documentation.

I think it was also be nice to have an ‘unclear’ button. As I was reviewing Obstacle in Path/Surface Problem labels, there were several that I felt unsure about. This might also help us figure out what types of scenarios are borderline and how to deal with them.

jonfroehlich commented 6 years ago

Love these mocks. Aileen and I reviewed them today in person and sketched out a rough draft of another design (which is an amalgamation of many of her great ideas). @aileenzeng, perhaps you can post my (awful) sketch when you get a chance.

aileenzeng commented 6 years ago

img_3108

It's definitely not awful - gets the point across nicely! 😁

misaugstad commented 6 years ago

Yes yes yes I love all of this! And this most recent mockup really does bring together a lot of my favorite parts of the earlier ones! :smile: I think this really brings together the pieces that I thought were necessary (large SV image, fairly big buttons for yes/no/unsure, documentation on what is/isn't a curb ramp, and a comment field)!

aileenzeng commented 6 years ago

Here's a more detailed version of what the audit page could look like: image

I was thinking that the user could scroll up/down documentation (since we can't display that much info at once), but it's a little difficult to tell from this mock. We could maybe add button that allows them to jump down below the audit screen that allows them to see the entire what is a curb ramp/not a curb ramp at once. Alternatively, it could also link to the labeling guide, although I was thinking that the info here would be a little more picture-dense.

Here's what the bottom screen portion might look like: image

(for the bottom images, I'd get rid of the curb ramp labels for the real thing)

jonfroehlich commented 6 years ago

@aileenzeng and I talked about this in person. Just to quickly summarize:

  1. I really don't think people will ever read. That should be a guiding mantra of ours. :) So, while I like this idea of expanding the info on bottom and the clever use of space here, I really think we need to maximize ways of presenting easy-to-digest information about correct and incorrect labels. So, we have to be creative about this and I think it should be visible at all times. So, perhaps like a banner bar of positive and negative examples with extremely concise captions under each example?
  2. I don't think buttons need gradients. We are still in the flat design era
  3. People will see verifying 50 labels and vomit with feelings of anxiety. We need to make this seem much more accomplishable--like 10 or 15 (I do love that you adopted the mission and completion style from other mission types--that consistency is super important).
  4. Oh also, I'm not sure people will ever give us comments--we are dedicating a fair bit of space to something that will be infrequently used.... maybe add a comment button somewhere? (Or maybe the feedback button on left is sufficient and we quickly onboard that?)

I'm sure @misaugstad has other amazing advice and insights (as usual). Also, he loves frontend work!!!

aileenzeng commented 6 years ago

Thanks for all the feedback! Here are two more ideas: image (the arrows are a little funky on this one)

image

misaugstad commented 6 years ago

I really like that 2nd one!!

aileenzeng commented 6 years ago

From @jonfroehlich‘s email:

I like the general direction of the interface so far, but I was thinking when you ask someone to verify a problem, if they say no, we should immediately ask a follow-up question to gather more data. So for example, “ is this a surface problem?” If the user says no, then we immediately ask “is this a sidewalk accessibility problem?”

So the key point here is that we can have multiple question answering stages based on the label type and what the user responds.

aileenzeng commented 6 years ago

Here's my to-do list - I've tried to break up each task into a lot of small tasks. If there aren't a lot of indented checkboxes, that means that I haven't looked at the problem closely yet. I'll update this as I'm working.

Mission Infrastructure-related tasks Backend

Frontend

Label tasks Backend

Frontend

Logging data

Panorama tasks I still need to break this down a lot more, but here's the general idea for what I'm hoping to do.

Other

jonfroehlich commented 6 years ago

This looks good @aileenzeng, thanks for putting this together. One thing that is missing is a notion of timeline--when do you think you could get some of this stuff done? The $1m question, I know. ;-)

aileenzeng commented 6 years ago

I want to get the mission infrastructure/label/logging tasks done by Nov. 24th (latest), but will make a push to get it done by Nov. 18th. I'll take care of the panorama stuff next, but am still a little hesitant about giving an end date.

jonfroehlich commented 6 years ago

Ok, thanks.

On Thu, Nov 8, 2018 at 8:41 PM Aileen Zeng notifications@github.com wrote:

I want to get the mission infrastructure/label/logging tasks done by Nov. 24th (latest), but will make a push to get it done by Nov. 18th. I'll take care of the panorama stuff next, but am still a little hesitant about giving an end date.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/SidewalkWebpage/issues/535#issuecomment-437248343, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi-9ewCBjPeHdkfJVsEqczcUq_Ph6HDks5utQeTgaJpZM4MlMlK .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://www.cs.umd.edu/~jonf/ http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter

jonfroehlich commented 5 years ago

I've been thinking more about this.

I also think--which we've discussed many times before--that validation is a perfect task for the mobile phone. I'm really hoping that we can utilize the ~20% of traffic that is mobile only and get them to do some fun validation work! :)

manaswisaha commented 5 years ago

I agree with all the points. Some of these points came up in the meeting this week when we were going over Aileen's implementation.

I also think--which we've discussed many times before--that validation is a perfect task for the mobile phone. I'm really hoping that we can utilize the ~20% of traffic that is mobile only and get them to do some fun validation work! :)

I strongly agree! I feel this is a good starting point to venture into mobile interfaces instead of creating a full fledged labeling interface which is a harder task than a validation interface. And there is a lot of scope for gamification. This could easily be a great undergrad project! We should advertise about this project specifically more. It's easy to understand, approachable, and not very intimidating.

jonfroehlich commented 5 years ago

I strongly agree! I feel this is a good starting point to venture into mobile interfaces instead of creating a full fledged labeling interface which is a harder task than a validation interface. And there is a lot of scope for gamification. This could easily be a great undergrad project! We should advertise about this project specifically more. It's easy to understand, approachable, and not very intimidating.

Also agree. I was thinking @aileenzeng might take this on after she finishes the web-based validation stuff... :) But we could also discuss recruiting an additional student (I just think they'd have to have a fair bit of dev experience to be able to contribute to this...).

jonfroehlich commented 5 years ago

Dumping more thoughts on this...

How do we choose what gets validated and when and by whom? I think this question is really interesting and may involve algorithms from optimization, reputation systems, etc. For example, our system should have ongoing inferences about a worker's quality, which is then strengthened or weakened by validations. I could also imagine using our CV subsystem--which Galen and Esther are currently working on--to help prioritize what gets validated.

aileenzeng commented 5 years ago

Hi all,

I think I'm getting very close to finishing implementing functionality for the validation interface (hooray)! The main mid/high-priority items left on my to-dos are:

I was wondering what my next steps should be. Should I start drafting instructions for testing? Or should we worry about any UI polishing? For reference, the validation interface currently looks like this: image

(I'm basing the design off this mockup) image

jonfroehlich commented 5 years ago

I would focus on a full end-to-end MVP rather than polish. We don’t need any UI improvements until we actually start using this thing. So, resolving the three bullet points should take priority.

Also, how are you choosing which labels to get validated? That seems like a crucial step (but certainly one we can iterate on after MVP).

Sent from my iPhone

On Dec 18, 2018, at 12:30 AM, Aileen Zeng notifications@github.com wrote:

Hi all,

I think I'm getting very close to finishing implementing functionality for the validation interface (hooray)! The main mid/high-priority items left on my to-dos are:

Checking if panos exist before sending label ids to the validation interface. Returning a list of labels to be validated rather than just one label at a time. Finishing up logging data (very fast - will be done last) I was wondering what my next steps should be. Should I start drafting instructions for testing? Or should we worry about any UI polishing? For reference, the validation interface currently looks like this:

(I'm basing the design off this mockup)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aileenzeng commented 5 years ago

It's done!

Also, how are you choosing which labels to get validated? That seems like a crucial step (but certainly one we can iterate on after MVP).

Oops - sorry for not getting back to you on this. Right now, labels are being selected randomly. We run a check on the backend to see if the pano exists or not (if it doesn't, then we select a new random label).

jonfroehlich commented 5 years ago

What's done? You have a full end-to-end MVP working? If so, woohoo!

Oops - sorry for not getting back to you on this. Right now, labels are being selected randomly. We run a check on the backend to see if the pano exists or not (if it doesn't, then we select a new random label).

Fine for now but see https://github.com/ProjectSidewalk/SidewalkWebpage/issues/535#issuecomment-447386308

aileenzeng commented 5 years ago

Yep! :) I’m going to work on some testing instructions next, but won’t be at a computer for the next few days.

jonfroehlich commented 5 years ago

OK, great. Looking forward to it!

On Wed, Dec 26, 2018 at 9:37 AM Aileen Zeng notifications@github.com wrote:

Yep! :) I’m going to work on some testing instructions next, but won’t be at a computer for the next few days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/SidewalkWebpage/issues/535#issuecomment-449998921, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi-9fCzwEjyeAO1R5f1Y3hUwNpG68fsks5u87PbgaJpZM4MlMlK .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://www.cs.umd.edu/~jonf/ http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter

aileenzeng commented 5 years ago

Here are some testing instructions (sorry for the delay!). I'm putting them in here and not in a PR because I'm not sure if it's ready for a PR yet:

Changes

image

Testing

Part 1: Set up / initial testing

  1. Checkout 535-create-validation-interface. You might want to use a small dump for testing purposes!
  2. Delete all records in the label table.
  3. Go to http://0.0.0.0:9000/audit and place one label onto the panorama.
  4. Go to http://0.0.0.0:9000/validate. Click any of the agree/disagree/unclear buttons a few times and make sure that nothing looks weird (blank GSV Panorama, label not loading, etc...)
  5. Go to http://0.0.0.0:9000/audit and place one label onto the panorama in a different spot than the label in step 3. Then, delete the label. Check the label table to confirm that this label exists and has the delete column marked.
  6. Go back to http://0.0.0.0:9000/validate. Hit the agree/disagree/unclear buttons a few times. The only label that you should see on the screen is the label from step 3 (the label from step 5 should never appear on the screen).

Part 2: General testing

  1. Go to http://0.0.0.0:9000/audit and audit a few streets / complete a mission or two. Place labels at a variety of zoom levels as well as some incorrect ones if you feel like it :)
  2. Go to http://0.0.0.0:9000/validate. When you refresh, you should have a larger variety of labels remaining. (I left in the console.log statements that shows the different labelIds of the labels loaded onto the screen).
  3. Validate a few labels. Use the buttons on the bottom of the screen and try using keyboard controls (A - agree, D - disagree, U - unclear).
    • The "Is this a ___? " title should match the label type that is on the screen.
    • Each mission is 10 labels for now, so every time you validate a label, the mission progress should increase by 10%.
    • Check the label_validation table to see that these labels were validated correctly.
      • In the validation_result column, there should be 1 for agree, 2 for diesagree, and 3 for unclear.
    • Check validation_task_interaction to check that these interactions were recorded correctly.
      • There should be a ValidationButtonClick_ event if you clicked a button or a ValidationKeyboardShortcut_ event if you used keyboard shortcuts
  4. Complete a mission. You should get 10 new labels. (You can check the console in Chrome to see which labels were selected).
  5. Validate a label, then quickly refresh the mission afterwards. The mission progress should have updated before you left the page. Check that the mission progress is correct. Try this at different parts of the mission (0 labels, somewhere in the middile, 9 labels).
  6. Click the "Skip" button. The mission progress shouldn't be increased.
    • Check the validation_task_interaction table for to make sure the ModalSkip_Click action is logged.
  7. Submit a comment using the Feedback button.
    • The feedback should be logged in the validation_task_comment table in the comment column.
    • You should also check that the interactions are logged in the validation_task_interaction table. It should have ModalComment_ClickOK and ModalComment_ClickFeedback.
  8. Enter svv.panorama.getProperty("panoId") into the Chrome console to get the current panorama id. Copy the panorama into the following query:
    SELECT * FROM gsv_data WHERE gsv_panorama_id LIKE '<panoId>';
    • There should be a last_viewed column that shows the last time this panorama was viewed and the expired column should be marked as false.

Here is a query for checking the validation_task_interaction table that filters out LowLevelEvent_ and POV_Changed interactions:

SELECT * 
    FROM validation_task_interaction 
    WHERE validation_task_interaction.action 
    NOT IN ('LowLevelEvent_mousemove',
               'LowLevelEvent_mouseover',
               'LowLevelEvent_mouseout',
               'LowLevelEvent_mouseup',
               'LowLevelEvent_mousedown',
               'LowLevelEvent_keydown',
               'LowLevelEvent_keyup',
               'LowLevelEvent_click',
               'POV_Changed')
    ORDER BY TIMESTAMP DESC

Part 3: Stress testing

  1. Hold down the A, D or U key, or try hitting multiple keys at the same time. The label should only be validated once.
  2. Use the arrow keys to navigate around the panorama. This isn't intended user behavior, but seems to be built into the GSV API (and I haven't been able to disable it yet). If you move to a different panorama, the label should disappear.
  3. Any other ways you can think to stress test the system!
misaugstad commented 5 years ago

@aileenzeng can you just add screenshots of what it looks like right now?

aileenzeng commented 5 years ago

Yep! Updated the comment.

misaugstad commented 5 years ago

This looks really great @aileenzeng !!! It works really well, it looks like you put a lot of effort into it!!

jonfroehlich commented 5 years ago

This looks really great @aileenzeng !!! It works really well, it looks like you put a lot of effort into it!!

Wo0t! Go @aileenzeng, go!

aileenzeng commented 5 years ago

Hooray! Thanks for the support everyone.

@misaugstad Thanks for bringing up those two points - I'll start addressing those!

I also noticed that we're not directly logging information about what the user's screen looks like when they've validated a label in the label_validation table. Do we want to record information like heading, pitch, zoom there (or do we want a new table that stores that information?)

misaugstad commented 5 years ago

@aileenzeng hmmm that is tough. I was originally leaning towards keeping it as part of the label_validation table since that info is included in the label table.

But now I'm thinking that it isn't as relevant/important in validation. It might also be confused with the actual heading/pitch/zoom from the user who placed the label in the first place...

I'm honestly on the fence. @aileenzeng how about we go with whatever you think would be best (be that the best design, the easiest to implement, the easiest to work with in the future).

jonfroehlich commented 5 years ago

I also noticed that we're not directly logging information about what the user's screen looks like when they've validated a label in the label_validation table. Do we want to record information like heading, pitch, zoom there (or do we want a new table that stores that information?)

Yes, we will need to log this and should do so comprehensively (just like we do for the auditing interface)

aileenzeng commented 5 years ago

We probably want to add a delay so that users can't rapid-fire hit a button without actually looking at the pano. Also so that one cannot accidentally hit the button twice and "validate" the second label without meaning to.

I've added a delay that makes the user wait 850ms between validations. It doesn't look like Google StreetView has any listeners that will let us know when the panorama has loaded yet (which would be the more ideal solution to this problem).

You can do some weird stuff with the keyboard shortcuts if you really try to break it. Like if I hold A, tap D, then hold D, then tap N, then hold N, I have now validated 3 labels and all 3 buttons look like they have been pressed 😆 I don't think this is super important though, no one should be doing that...

This should be fixed now!

I also noticed that we're not directly logging information about what the user's screen looks like when they've validated a label in the label_validation table. Do we want to record information like heading, pitch, zoom there (or do we want a new table that stores that information?)

Yes, we will need to log this and should do so comprehensively (just like we do for the auditing interface)

Now the label_validation table also has the following columns:

canvas_x: the x-coordinate for the upper left corner of the bounding box for the label canvas_y: the y-coordinate for the upper left corner of the bounding box for the label heading: user heading pitch: user pitch zoom: user zoom canvas_height: height of the GSV Panorama (always 410px) canvas_width: width of the GSV Panorama (always 720px)

Other changes

Additional testing instructions

  1. Restore the database.
  2. Take the onboarding tutorial, then start a mission. Then, place one label in a regular mission.
  3. Go to http://0.0.0.0:9000/validate. Validate some labels. You shouldn't see the tutorial labels ever! (You should only get that single label that you had placed on the screen).
  4. Try hitting the A/D/N keys at the same/similar time, or try holding down one key. Only one of the buttons should be gray at any time.
  5. Try clicking buttons while holding down either the A/D/N keys. If you click on a different label, then the button color should change to the one that you clicked. The other buttons should also turn gray when you hover over them.
  6. Refresh the page - check that the mission progress hasn't changed unexpectedly!
  7. In the console.log statements, you should see something like TOP: ____ LEFT: ____. Use the element selector in Chrome to select over the label. Check that the values by TOP and LEFT are +/- 0.5 same as the top and left CSS attributes for the label.
  8. Check that these values are being added to the label_validation table.
  9. When you look at the zoom columns in the label_validation or validation_task_interaction columns, they should always be 1.1, 2.1 or 3.1.
  10. Try moving around with arrow keys. The label should disappear if you're on a different panorama from your original one.
misaugstad commented 5 years ago

@aileenzeng this all seems to be working as you describe!

misaugstad commented 5 years ago

@aileenzeng It looks like this line in your evolution file is missing a semicolon!

DROP TABLE validation_task_comment

aileenzeng commented 5 years ago

Oops - thank you! Good catch!