fhdsl / metricminer

R package that digs up data that matters from APIs, making it dashboard-ready
https://hutchdatascience.org/metricminer/
MIT License
2 stars 0 forks source link
edtech-software

metricminer

Digging up data that matters, making it dashboard-ready.

metricminer is an R package that helps you mine metrics from common places on the web through the power of their APIs.

It also helps format the data so that it can easily be used for a dashboard or other purposes. It will have an associated dashboard template and tutorials to help you fully utilize the data you retrieve with metricminer (but these are still under development!)

Apps Supported

Currently metricminer supports mining data from:

Table of Contents generated with DocToc

Data Format Options

metricminer retrieves API data for you and gives it to you in a format that is a tidy data frame. This means metricminer has to be selective about what metrics it returns, ensuring it fits in a useful data frame that can be easily read by humans.

If you find that the data returned is not what you need, you have two options (these options can be pursued concurrently):

  1. You can set the dataformat argument to "raw" to see the original, unedited JSON formatted data as it was returned from the API. Then you can personally look for the data that you want and extract it.
  2. You can post a GitHub issue to explain why the metric missing from the data frame of formatted data should be included. And if possible and reasonable, we can work on including that data in the next version of metricminer.

How to install

If you want the development version (not advised) you can install using the remotes package to install from GitHub.

if (!("remotes" %in% installed.packages())) {
  install.packages("remotes")
}
remotes::install_github("fhdsl/metricminer")

Attach the library or decide to use the metricminer:: notation.

library(metricminer)

Basic Usage

To start, you need to authorize() the package to access your data. If you run authorize() you will be asked which app you'd like to authorize and whether you'd like to cache that authorization information. If you already know which app you'd like to authorize, like google for example, you can run authorize("google").

Then follow the instructions on the upcoming screens and select the scopes you feel comfortable sharing (you generally just need read permissions for metricminer to be able to collect data).

authorize()

If you want to clear out authorizations and caches stored by metricminer you can run:

delete_creds()

GitHub

You can retrieve metrics from a repository on GitHub by running:

authorize("github")
metrics <- get_github_metrics(repo = "fhdsl/metricminer")

Calendly

You can retrieve Calendly events information using this type of workflow:

authorize("calendly")
user <- get_calendly_user()
events <- list_calendly_events(user = user$resource$uri)

Google Analytics

You can retrieve Google Analytics data for websites like this.

First you have to retrieve your account information after you've authorized.

authorize("google")
accounts <- get_ga_user()

Then you need to retrieve the properties (a.k.a usually the websites you are tracking) underneath that account.

properties_list <- get_ga_properties(account_id = accounts$id[1])

Just need to shave off the properties/ bit from this string.

property_id <- gsub("properties/", "", properties_list$properties$name[1])

Now we can collect some stats.

In Google Analytics, metrics are your basic numbers (how many visits to your website, etc.).

metrics <- get_ga_stats(property_id, stats_type = "metrics")

Whereas dimensions are more a list of events that have happened. So here's a list of people that have logged on.

dimensions <- get_ga_stats(property_id, stats_type = "dimensions")

Lastly, we have a third option of collecting link_clicks and the links they have clicked. This is also known as a dimension according to Google Analytics. However it often isn't compatible for us to download data about link clicks at the same time as other dimension data so in metricminer we collect them separately.

link_clicks <- get_ga_stats(property_id, stats_type = "link_clicks")

Google Forms

You can retrieve Google Forms information and responses like this:

authorize("google")
form_url <- "https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit"
form_info <- get_google_form(form_url)

Slido

If you have used Slido for interactive slide sessions and collected that info and exported it to your Google Drive, you can use metricminer to collect that data as well.

drive_id <- "https://drive.google.com/drive/folders/0AJb5Zemj0AAkUk9PVA"
slido_data <- get_slido_files(drive_id)

YouTube

If you have a YouTube channel and the URL is https://www.youtube.com/channel/a_bunch_of_letters_here

Then you can extract stats for the videos on that YouTube channel using that URL.

authorize("google")
youtube_stats <- get_youtube_stats("a_bunch_of_letters_here")

Bulk Retrievals

Maybe you just want to retrieve it ALL. We have some wrapper functions that will attempt to do this for you. These functions are a bit more precarious/risky in that there may be reasons certain websites/repos/events/data may not be able to be collected. So collecting repositories one by one will allow you more insight into what is happening.

However, these bulk retrieval functions may help you if you want to grab ALL of your accounts data in one swoop. Just make sure to carefully look over and curate that data after it is attempted to be collected. You may find some retrievals are empty for potentially good reasons (for example if a google form has no responses to collect it will show up with "no responses" in the respective part of the list).

GitHub bulk

From GitHub you can attempt to collect repository metrics from all repositories from an account.

authorize("github")

If you want to do this by giving a list of specific repositories you want data from you can just provide a vector of those repository's names like this:

repo_names <- c("fhdsl/metricminer", "jhudsl/OTTR_Template")
some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names)

If you want the time course related data, you can set time_course = TRUE.

some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names, time_course = TRUE)

Google Analytics bulk

Similar to single website retrieval we need to authorize the package.

authorize("google")
accounts <- get_ga_user()

Then we can provide the account id to all_ga_metrics and it will attempt to grab all stats for all website properties underneath the provided account.

stats_list <- all_ga_metrics(account_id = accounts$id[5])

Google Forms

As always, we need to authorize the app.

authorize("google")

We can retrieve a list of form ids using googledrive R package.

form_list <- googledrive::drive_find(
  shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"),
  type = "form")

Now we can provide this vector of form ids to get_multiple_forms

multiple_forms <- get_multiple_forms(form_ids = form_list$id)

Contributions

This is an ever-evolving package. contact csavonen@fredhutch.org if you are interested in helping us develop metricminer, or just file a pull request or issue!