WorldBrain / Memex

Browser extension to curate, annotate, and discuss the most valuable content and ideas on the web. As individuals, teams and communities.
https://worldbrain.io
4.38k stars 335 forks source link

Analytics for Athena/Quicksight #460

Open mukeshkharita opened 5 years ago

mukeshkharita commented 5 years ago

Here we have to write all the needed analytics.

  1. No. of Install/Uninstall per day/week/month
  2. Users who uninstall the extension, stay time of the user
  3. Avg stay time of users only who have uninstalls the extension
  4. No. of active users in the last week/month

@oliversauter You can add the remaining analytics that is needed in the Athena/Quicksight, Also mention the graph's x-axis name and y-axis name. :slightly_smiling_face:

blackforestboi commented 5 years ago

Charts we need:

The most important thing to get right at this point is the distribution of (un)install times, and the drill down to the individual user's stats Lets first get this right before we move on to the others

Second priority are those metrics:

Ability to drill down on day/week/month/year

Explanation terminology: Fixed number: no chart, just a number representing the value x: x-axis name and data type y: y-axis name and data type timeline: filterable by day, week and month

Apart from this thread what we should start is a Airtable with all the analytics we run, so we can be fully transparent to people and discuss potentially problematic metrics we get. Does anybody have concerns about the type of data we aim to collect at this point?

mukeshkharita commented 5 years ago

Hey, @oliversauter Thanks for putting the events down here.

and the drill down to the individual user's stats

For this, you have to copy the id and put the value of parameters.

Distribution of install times in 1) Less than a day

You mean no. of installs that day/week/month, right @oliversauter?

How long did user stay In case of only uninstalled extension user?

Can we send some last events with the uninstall request, so we know what happened right before @mukeshkharita) or are they lost because of the 20s block?

Yeah sure, we can send the events with the user id but it will be getting request so doesn't make more sense?

Second priority are those metrics:

These look quite easy, no need to think more about these, have to be written direct queries.

First I'll implement on Athena then we can move ahead on Quicksight.

blackforestboi commented 5 years ago

For this, you have to copy the id and put the value of parameters.

Not sure if I understand. Can you elaborate a bit more?

You mean no. of installs that day/week/month, right @oliversauter?

No, i mean a distribution of the time how long it took people to uninstall. So how long did they stay? The chart should categorise how many users stayed less than 1h, Less than 1 day, Less than 1 week, Less than 1 month? For that you could use a quicksight distribution chart: https://aws.amazon.com/data-visualization/ See "distribution" > Histogram

When this distribution is there, ideally we should be able to see a list of all the users that fall in each category. So for example a list of all users that have the tool deleted within one hour. Then we should be able to "drill down" on their actions. Means see a list of all actions of each single user, so we understand what he did before uninstalling. The last 30 minutes are enough to be listed. Also we should be able to run some analysis on the behaviour of each user we want to "investigate" and to see some stats about them. Like "how often did they perform a search or made an annotation", "how many successful searches" "how active was that user". Thats they points I listed above.

Makes more sense?

Yeah sure, we can send the events with the user id but it will be getting request so doesn't make more sense?

I don't fully understand what you mean with that? Can you try to rephrase it? What I tried to express is that if a user uninstalls, there will certainly be some events buffered up for being sent out once the user is idle for 20s. Those last events before uninstalling are really important for us to know, so we know what a user did before uninstalling. Can we get those events?

First I'll implement on Athena then we can move ahead on Quicksight.

Yeah first focus on the distribution and the list of events of each user. Those can go later.

mukeshkharita commented 5 years ago

@oliversauter Here I'm mentioning that I understood from these points.

Distribution of install times in: 1) Less than a day 2) Less than a week 3) Less than a month 4) the rest. (cohort, y: total number of users in cohort)

It this, we will have a graph that will be no. of uninstall per hr/week/days/month/year. We can drill down by date and it will tell us the no. of uninstalls by the period.

Ability to drill down from the above distribution into the single users in each cohort and being able to see analysis of all their usage pattern and their actions before uninstalling.

Question: When we will drill down by the uninstall time, how we can get the id of users and the action that is done before uninstalling the extension by the particular user. I'm not sure how we can do that on Quicksight we have to make one graph for each query and get the id from there and look at the actions in the other graph, on quick sight we can set one parameter for the particular user and then you can see all the events. https://docs.aws.amazon.com/quicksight/latest/user/parameters-in-quicksight.html

Stats about each user:

All the stats of the user will be shown in the different graphs, not in the single graph.

How long did user stay (fixed number: days, hours, minutes, seconds)

For every user who has uninstalled the extension, the graph will show the stay time in days-hh:mm:ss.

List of all actions & error messages 30 minutes before uninstallation.

Yeah, we will show every event of the particular user of 30 minutes before uninstallation.

Important: Can we send some last events with the uninstall request, so we know what happened right before @mukeshkharita) or are they lost because of the 20s block?

We can send the data with the uninstall request, but @poltak @ShishKabab can tell us more about, it will be good or not to send the data with get request. I can change the analytics uninstall API for this.

How active was the user (Fixed number: % of days with activity as defined below (active users)) I don't know if for that you need to work on active users first, so maybe doing the above work first)

Yeah, first we need to define when the user is considered to be active. Here you mean no. of days the user gets active on the extension to total days (current time - install time), right?

Active users (x: timeline, y: %) > defined by actions: search in overview/addressbar/popup (not including google integration), click on google integration results, bookmark, Memex.link, annotate, tag, add page to collection. We need daily(at least once per day), weekly(at least once per week) and monthly(at least once per month) averages here. First activity of each time frame counts as active. So if a user was active only once in 30 days, they are a monthly active user, and an active user for that week and day. If they were active each week once, they are monthly active user, weekly active user of each week, and daily active user on each individual day.

Here the graph will be between timeline and no. of the active user. If the user is active is in a week then it will also be considered active in the month/year, right?

Here I write about all the things that I understand, let me ask you if I've understood anything wrong.

blackforestboi commented 5 years ago

It this, we will have a graph that will be no. of uninstall per hr/week/days/month/year. We can drill down by date and it will tell us the no. of uninstalls by the period.

No this is a separate graph. What is needed here is a distribution. Example: 5 users have had Memex installed less than 1h, 10 people less than 1 week, 100 people less than 1 month. What we need is the abiltiy to have a distribution chart, and when clicking on one of the bars, it should show a list of all users. It would look a bit like this then:

screen shot 2018-08-11 at 00 25 08

Yeah, first we need to define when the user is considered to be active. Here you mean no. of days the user gets active on the extension to total days (current time - install time), right?

Yes

If the user is active is in a week then it will also be considered active in the month/year, right?

Yes, however we don't measure years. Month is highest.

All the stats of the user will be shown in the different graphs, not in the single graph.

Yes, ideally we should have the ability to have something like a "profile/dashboard" page of each user, where those stats are calculated separately. In the screenshot above you would get here by clicking on "see details" buttons. The profile may look a bit like this. But that is really just a very basic mockup, the values and charts are obviously not the ones we need. Just wanted to visualise it better for you.

screen shot 2018-08-11 at 00 26 48

I'm not sure how we can do that on Quicksight we have to make one graph for each query and get the id from there and look at the actions in the other graph, on quick sight we can set one parameter for the particular user and then you can see all the events.

I am not sure if I understand the page you linked correctly, but I think they are referring to AWS users here, not "users" in the concept we are using.

it will be good or not to send the data with get request

What do you consider "to be good", and what "not"?