ProjectSidewalk / sidewalk-data-analysis

Holds all offline data analysis scripts for Project Sidewalk required for our forthcoming paper submission
3 stars 0 forks source link

Add user dropoffs graph to stats_for_paper #4

Closed misaugstad closed 6 years ago

misaugstad commented 6 years ago

This would be a line graph or bar chart that shows the percentage of users remaining on the y-axis, and the x-axis would be a series of milestones including loading the audit page, finishing the tutorial, finishing an audit task, finishing a mission, etc.

@jonfroehlich and @manaswis let's try to nail down which stats we really want... Here is an initial sequence to work off of. Feel free to delete or add to it. I'm going to start with way more than we will want, so you can pick which ones to remove. And starting at number 12, the options are no longer sequential. I would also like input on how we want to deal with that.

  1. Loaded the audit page
  2. Clicked on "Let's get started!"
  3. Correctly placed first curb ramp
  4. Correctly marked severity for first curb ramp
  5. Successfully panned (the drop off might be very amusing here :joy: )
  6. Successfully zoomed in
  7. And so on through the tutorial...
  8. Finished the tutorial
  9. Clicked "Ok" after the tutorial, which leads to first mission overlay
  10. Clicked "OK" on first mission overlay, to begin auditing
  11. Clicked "OK" on the "Let's get started!" popup, which shows up right after they just clicked Ok twice in a row... I'm sensing some redundancy here :)
  12. Placed a label
  13. Took a step
  14. Finished a mission
  15. Finished two missions
  16. Audited for at least X minutes
  17. Placed at least Y labels
manaswisaha commented 6 years ago

For the user retention curve, we should have the following milestones (x-axis): 1,8,13,14,15

For tutorial only analysis, we should have each stage on the x-axis e.g. all steps for labeling the first curb ramp is the first stage, all steps for missing curb ramp (including zooming in) is the second stage and so on. So we would have 7 stages (corresponding to the 7 labels) and the 8th stage would be to take a step and do the rest to finish the tutorial.

misaugstad commented 6 years ago

For the user retention curve, we should have the following milestones (x-axis): 1,8,13,14,15

I'm assuming the retention curve can be across multiple sessions, right? It is just that the user completed 2 mission at some point, not that they did it in that first session, correct?

For tutorial only analysis, we should have each stage on the x-axis e.g. all steps for labeling the first curb ramp is the first stage

Just to clarify, by 8 stages you mean 8 steps on the x-axis; you don't meant to differentiate between the different steps within each stage. Did I read that correctly?

misaugstad commented 6 years ago

@manaswis for missions completed, do we want to do anything about the fact that the length of a first mission changed partway through? We could either ignore that there is a difference (in computing results, you can always talk about it in the paper of course), only look data after initial missions were switched to 500 feet, or analyze the two separately.

manaswisaha commented 6 years ago

I'm assuming the retention curve can be across multiple sessions, right? It is just that the user completed 2 mission at some point, not that they did it in that first session, correct?

I was thinking more for a new user when they start using the system starting from tutorial, when do they drop-off?

Points 14, 15, 16, 17 could be for returning users (i.e. they have already done the tutorial) - when do they stop working?

manaswisaha commented 6 years ago

Just to clarify, by 8 stages you mean 8 steps on the x-axis; you don't meant to differentiate between the different steps within each stage. Did I read that correctly?

Yes. Each of the 8 stages would be on the x-axis, where each stage consists of multiple steps as part of the stage (e.g. for marking the first curb ramp, steps would be: place a label, select a severity etc.) -- the individual steps for each stage won't be on the x-axis.

misaugstad commented 6 years ago

@manaswis should I do this strictly by IP address maybe, considering we don't really have a good way to connect pre-signup interactions to user id post signup? I mean, ip address is a reasonable proxy for a user, whether they are registered or not!

manaswisaha commented 6 years ago

@manaswis for missions completed, do we want to do anything about the fact that the length of a first mission changed partway through? We could either ignore that there is a difference (in computing results, you can always talk about it in the paper of course), only look data after initial missions were switched to 500 feet, or analyze the two separately.

Hmm, I think we should get the results for each separately -- so then we could check see if there were any differences between the timeline with first mission as 500ft vs 1000ft. So we should be able to talk about it separately if we do see differences, else we talk about it as a general first mission (without talking about the distance covered).

misaugstad commented 6 years ago

Hmm, I think we should get the results for each separately

Sounds good.

the individual steps for each stage won't be on the x-axis.

Sounds good.

I was thinking more for a new user when they start using the system starting from tutorial, when do they drop-off?

I still don't know if this means only their first session or across multiple sessions :)

manaswisaha commented 6 years ago

I still don't know if this means only their first session or across multiple sessions :)

First session as a new user when they start using the tool by going through the tutorial first.

misaugstad commented 6 years ago

Okay, and do you think it sounds good to just base this off of IP address?

manaswisaha commented 6 years ago

For anonymous users, that's the only way we have. This should be done for registered users as well (there are users who create accounts first then start using the tool).

On Thu, May 3, 2018 at 2:44 PM, Mikey Saugstad notifications@github.com wrote:

Okay, and do you think it sounds good to just base this off of IP address?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-data-analysis/issues/4#issuecomment-386446781, or mute the thread https://github.com/notifications/unsubscribe-auth/ACvXgAqt8gTZh_8_HaSrS3xULA2P9TFkks5tu3okgaJpZM4TrlXa .

-- Best Regards, Manaswi Saha Ph.D. Student Paul G. Allen School of Computer Science & Engineering University of Washington, Seattle homes.cs.washington.edu/~manaswi http://homes.cs.washington.edu/~manaswi Twitter - @manaswisaha https://twitter.com/manaswisaha

misaugstad commented 6 years ago

I'm saying that by ignoring registration of users, if we just do it based on IP address, shouldn't that pretty accurately cover registered users as well? Like IP address is a reasonable proxy for a user, whether they are signed in or not! And we don't have an easy way to link registered users to the auditing they did before registering anyway.

manaswisaha commented 6 years ago

Oh I see. I think so it should be fine.