ProjectPythia / projectpythia.github.io

https://projectpythia.org
Apache License 2.0
32 stars 18 forks source link

Automate collection of various metrics #319

Closed clyne closed 5 months ago

clyne commented 1 year ago

The MetPy team has scripts to collect metrics on GitHub usage that we can adapt and deploy across our various repos.

There are also BinderHub metrics that can be harvested from Jetstream.

We should automate the collection of whatever "useful" metrics we can.

cc: @dopplershift @ktyle

brian-rose commented 1 year ago

related: #207

clyne commented 1 year ago

Suggested path forward:

  1. Identify what kind of metrics we would like to collect.
  2. Determine which of the above can be reasonably automated
  3. Implement metrics collection

N.B. I supposed the order of (1) and (2) could be swapped :-)

mgrover1 commented 1 year ago

At SciPy last week, the Scientific-Python community mentioned they have put together some helpful metrics tools, including

These tools may be helpful for generating automate statistics + visualizing these

jukent commented 11 months ago

It looks like Google removed its easy website tracking. Every tutorial I follow says to click options that no longer exist "tracking -> website tracking" in the settings. And links go to "Service Unavailable."

It looks like they now want us to use another service called Google tag manager https://marketingplatform.google.com/about/tag-manager/

clyne commented 11 months ago

Here are a some of the questions around metrics that I think would be useful to be able to answer:

  1. What content is accessed most (least)?
  2. How much binder activity do we have?
  3. How much more (or less) is binder used than downloading the content to run locally?
  4. How many users do we have?
  5. What part of the world uses our content most?

For all of these it would be great to be able to plot trends over time.

Obviously, not all of these may be practical to capture. But it is a starting point :-)

jukent commented 11 months ago

Using the embedded API it looks like people would have to log in on the API to view the metrics, so this might not be the path we want to pursue.

I'm looking into making a GH action that downloads our Google analytics nightly. I'll have to pass in my log in information as secrets. Then once I have that figured out, we can make the action create a new plot nightly of the data?

https://www.searchenginejournal.com/how-to-access-google-analytics-api-via-python/474458/

E: This blog post looks promising: https://janakiev.com/blog/python-google-analytics/ (Learning API requests is new to me, so it's been hard to know what path to choose)

clyne commented 11 months ago

Getting the raw data should be step one. If there is a way to embed a plot in our website that might be really cool, but certainly isn't a must do. Simply displaying as text, for example, the total number of users would satisfy our NSF obligations. Having access to the raw data would allows us to analyze it anyway we want.

jukent commented 11 months ago

I'm trying to follow the quickstart guide: https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py but keep encountering permissions errors

Screen Shot 2023-09-22 at 12 05 51 PM Screen Shot 2023-09-22 at 11 59 14 AM

Do you know who could grant me more permissions? Would it be @kmpaul or someone in VAST billing?

clyne commented 11 months ago

That may be a question for @dopplershift

dopplershift commented 11 months ago

Ooof, not sure there. When I did this for MetPy I had sufficient permissions I guess.

clyne commented 11 months ago

You may have to turn to the interweb for this one, @jukent . That's my favorite useless message: contact your administrator :-)

clyne commented 11 months ago

Maybe this resource would be helpful. Apparently it is being used for Intake

jukent commented 11 months ago

Thanks @clyne Looking into that now.

Update on analytics permissions - turns out I only had view permissions. I just requested administrator permissions and it said it would email the existing administrators for them to grant me permissions but no where does it list who that is. If you get an email please grant it :)

E: According to this old issue https://github.com/ProjectPythia/projectpythia.github.io/issues/114 @kmpaul wrote:

@clyne @dopplershift @brian-rose @ktyle: Ok. You all have administrative privileges on the Project Pythia Google Analytics` account. So, you should be able to see all of the analytics data.

So I need one of you to grant me permission or take over this issue

brian-rose commented 11 months ago

@jukent I got that email and I think I clicked the right buttons to grant you admin privileges.

ktyle commented 11 months ago

Hmmm when I go to the relevant page I still see Julia has having just Viewer privs?

Picture1

brian-rose commented 11 months ago

Try now! I made one more change.

jukent commented 11 months ago

The page looks different now! Yay!

jukent commented 10 months ago

Screen Shot 2023-11-06 at 11 00 20 AM Screen Shot 2023-11-06 at 11 00 32 AM Screen Shot 2023-11-06 at 11 00 46 AM Screen Shot 2023-11-06 at 11 03 36 AM

Every avenue I try I still encounter permissions errors. I'm not sure what I'm lacking with admin privileges on the analytics page.

I need to pass the baton on this.

jukent commented 8 months ago

An update on this:

Google permissions are a maze, so while I had admin permissions to the domain I did not have them to the metrics. Joel Daves (the UCAR Cloud admin) helped me with this by fixing the permissions on UCAR's end, and it looks clear (if still sometimes confusing) the steps forward.

Made some progress in enabling the API and creating the service account to interact with the metrics, and then had to table this for AGU prep. I was hoping to focus on it this week but there was the hiccup of our Google domain expiring and having to switch to AWS. While I was not 100% confident that our metrics would be unaffected by this, I did not want to focus on this the last two days. The domain transfer will be complete in the next 6 hours, but it looks safe to continue work.

Remaining steps according to this guide are to

  1. Install the client library (this will want to be done on GitHub somewhere but I'm not 100% sure. For starters I will work on this locally to get a feel for it, but this gave me pause)
  2. Set up and test a sample
  3. Troubleshoot

So it looks like we're close to making progress and then having a more serious conversation about what code we want to write to grab what metrics.

Currently wrestling with an apt-get command not found issue that is hopefully pretty standard. I will try to resolve after lunch.

jukent commented 6 months ago

After getting brew and the api installed, I set up the sample and edited the required fields except "VIEW_ID". Which I should be able to find at the Account Explorer. But for ProjectPythia, it says no UA views for this account. So I'll have to learn more about what UA views are as the next step before setting one up.

Notes on how to make a new UA view

erogluorhan commented 6 months ago

After getting brew and the api installed, I set up the sample and edited the required fields except "VIEW_ID". Which I should be able to find at the Account Explorer. But for ProjectPythia, it says no UA views for this account. So I'll have to learn more about what UA views are as the next step before setting one up.

Notes on how to make a new UA view

UA (Universal Analytics) is their older generation measurement solution that was replaced by the Google Analytics 4 [GA4]. That said, I am not sure if Account Explorer is the best solution to do this because Pythia's account already uses the newer generation [GA4] properties.

I believe that this API instead is what is actually compatible with GA4 as the docs suggests. And, it seems like the API you linked before may be the deprecated one (i.e. not compatible with Pythia's GA4 property)

erogluorhan commented 6 months ago

and it seems to be straightforward from this quick start using client libraries as long as you have permissions to enable an API with your Google account (I don't seem to have)

jukent commented 6 months ago

Interesting thanks for sharing that info before I went too far down this path! The APIs you linked don't seem to work for me, they say "Service Unavailable." Same with the quick start link.

erogluorhan commented 6 months ago

I got the same errors when I clicked on them (and even your API links from a while ago), but they should work if you copy their link and paste in a new tab and go.

Right clicking and then opening them in an incognito window also seems to work, but that wouldn't be practical as you will need to sign into your ucar google account

jukent commented 6 months ago

Just an update here -- the analytics ap4 quick start guide I was following was deprecated at the end of January in favor of analytics v1. I thought AP4 would match up with our GA4 product, but that is just a naming coincidence and they no longer work together.

On the new service, I do not have permissions to enable it -- so I'll have to email some people again.

jukent commented 5 months ago

Update as of 3/11, opened a draft PR #407 that explains the steps I took there. The action is failing on my fork though because it says it can't find a file. I'm pretty sure this is related to fork vs upstream issues (the branch doesn't yet exist upstream). I can't look at this again until Friday because of a hackathon all week. I'll try merging this into a temporary upstream branch so that we can run the PR there.

Essentially,

jukent commented 5 months ago

I got the automate-metrics issue to pass when using workflow_dispatch call (you can see that here). But then I had to change it to on workflow_call to let the nightly_build action call it. Now the secrets don't seem to be passed in as expected and it is failing again. I think this is a lot of progress though.

jukent commented 5 months ago

Check it out!

https://projectpythia.org/metrics.html