IQSS / dataverse-metrics

Aggregate and visualize metrics for installations of Dataverse around the world
https://dataverse.org/metrics
Apache License 2.0
8 stars 9 forks source link

crazy idea: dataverse-metrics should pull from authoritative installation list #76

Closed donsizemore closed 1 year ago

donsizemore commented 2 years ago

https://github.com/IQSS/dataverse-installations/blob/master/data/data.json is more complete and current.

Note that dataverse-installations' map checks for an installation in the dataverse-metrics JSON, and if found, includes a link to dataverse.org/metrics in that installation's display.

PaulBoon commented 1 year ago

There is this script that updates; https://github.com/IQSS/dataverse-metrics/blob/master/global/update-all-installations-list.sh. But this must be run from time to time, seems that https://metrics.dataverse.org has not done that. Also we should update with every release, maybe soon. The other approach is to always have this empty and forcing you to run that script at least once when installing.

@donsizemore The config.json.sample also needs updating, or am I somehow confused?

donsizemore commented 1 year ago

@PaulBoon not crazy at all. IIRC things weren't set up this way because at the time not all installations were new enough (4.11+?) to support metrics, some in the authoritative list have gone away, some refuse connections. In general, though, I support this change.

PaulBoon commented 1 year ago

After some fiddling with jq I can produce a JSON array of the urls like this.

curl -s -S https://iqss.github.io/dataverse-installations/data/data.json | jq -r '.installations[].hostname' | jq --slurp --raw-input 'split("\n")[:-1]' | jq 'map(.|= "https://" + .)'

We could also 'merge ' this with that config, but the problem is that not all sites have https, so not sure if this is the way to go.

Maybe we could add extra fields indicating if it was http only (exception I hope) and possibly if has to be ignored on a metrics overview.

donsizemore commented 1 year ago

@PaulBoon there was a host of operational issues. I do like the idea of an option to tell dataverse-metrics to get its list from the canonical source, though.

PaulBoon commented 1 year ago

@donsizemore Ahh, you mean that the python code could do the processing every time it's run (crontab probably). About the field that might be added, I meant to be added to that data.json file so we can automate more precisely, not sure how that was produced though (GitHub action maybe?).

donsizemore commented 1 year ago

@PaulBoon it was a crazy idea. On Friday I diffed the hostnames from metrics' current config from the all-installations JSON, and wound up removing about half of what I added. Installations aren't running 4.11+ to offer the metrics URL, or they block HTTP calls to the metrics API endpoints, and a couple appear to be running a fork and/or returning non-standard data. So... the list is more current than it was. Once the scripts run cleanly I'll copy that list over into the sample config.