Rothamsted / knetminer

KnetMiner - webapp to search and visualize genome-scale knowledge graphs
https://knetminer.com
MIT License
25 stars 16 forks source link

Google Analytics requires upgrades to GA4 #750

Closed marco-brandizi closed 1 year ago

marco-brandizi commented 1 year ago

Google has announced long ago that their current services for tracking web visits, named Universal Analytics, will shut down in July 2023 and everything should be migrated to the new service, named Google Analytics 4 (GA4).

After some investigation, I've found the following.

Migrating the UI calls is effortless, we are already using the new code, which is backwards compatible, and we just need to set a new ID to switch to GA4

Migrating the API calls is a different beast, since this has to happen on the server and there is no browser that can use complicated Google-provided Javascript to setup calls to their API correctly. In fact, for this case, Google expects us to use what they call the measurement protocol (MP).

I've managed to make this work (not yet with code, by just playing with HTTP calls, but doing the same with Java is trivial), however, at the moment, the MP does not manage any geographical information tracking. That is, it doesn't consider the client IP, it doesn't resolve it to its geographical location and doesn't show anything about the user provenance (for MP records). From what I've seen, I doubt they have the intention to support this in the near future.

This doesn't prevent us from sending the client's IP via the MP, attaching it as a web call parameter (namely, as a parameter of their event object). Then, one can see a summary of the most frequent IPs that called our API (ie a table of IP/occurrences), even with per-instance/per-dataset split. But, with the MP only, we would see just that.

Another kind of information that should be easy to track via the MP is the domain of those who call our API, eg, ebi.ac.uk would be easy to identify (but /ensembl wouldn't be).

Alternatives:

The first option is fairly quick to implement. The others require quite more time. To be decided and sorted out by, I'd say by the end of May 2023.

KeywanHP commented 1 year ago

We can go with option 1. It has all the benefits we need and is easy. Especially, now that our genepage is client side, I assume GA4 will capture its calls and geolocation too. This is the page linked from Ensembl, wheat-expression, wheatIS, GrainGenes, T3 etc.

Authentication for our APIs is something we should consider for the new KnetMiner architecture.

marco-brandizi commented 1 year ago

Thanks, @KeywanHP. Acutally, at the moment the client-side tracking is rather poor, there is only a tracking call when the UI opens. But we can expand it, add more fine-grained tracking and the like.

marco-brandizi commented 1 year ago

This should be complete now. @KeywanHP, check on the [GA dashboard](), ci-test is sent to the site/property: "Knetminer Test Site - GA4", you can see live hits on Reports/Real Time.

Both the API and UI tracking are going to the events section of this view. Every 24h, the main dashboard is updated too, and I can see the latter has a richer set of reports.

Events are always prefixed with type (UI/API), data source (wheat, aratiny, etc) and the type of event. Each event has parameters (eg, keywords, gene list size)

datasets/poaceae-test and datasets/poaceae are going to the same property, as it was the case for the old UA. Maybe you want to have per-instance properties.

TODO: I need to move the above notes to the wiki/documentation.