NatLibFi / Skosmos

Thesaurus and controlled vocabulary browser using SKOS and SPARQL
Other
219 stars 94 forks source link

Create, collect and publish statistics #1410

Open nichtich opened 1 year ago

nichtich commented 1 year ago

Description of the enhancement

There should be a general statistic page with information such as

Some statistics are already shown as part of vocabulary information (Resource counts by type and Term counts by language) but this information is only for one vocabulary each and the data is not aggregated.

The statistics should also be available in machine-readable form so it can be aggregated across Skosmos instances and logged over time (maybe with https://www.w3.org/TR/vocab-data-cube/).

The statistics should also be visualized where it makes sense (e.g. number of concepts per hierarchy level).

Who are the users that would benefit from the enhancement and how?

Management and publication of vocabularies can benefit from more numbers.

Why is the enhancement important?

Curiosity, publicity and management.

Implementation ideas

More detailled statistics may go beyond the scope of Skosmos, so it may make sense to decouple creation of statistics from Skosmos but let Skosmos only display results. See https://doi.org/10.1515/9783110308464-016 for some theoretical background and https://observablehq.com/@nichtich/jskos-metrics for a mockup with some vocabulary statistics. Possible sub-issues:

P.S: We plan to work on this in 2024 as part of extending BARTOC.

osma commented 1 year ago

Thanks for the suggestion. Indeed, statistics is important to us (Finto) as well. Currently we collect some similar statistics once per year directly from the SPARQL endpoint and/or RDF files in Git version control, but implementing this kind of functionality in Skosmos itself could potentially make it easier.

PRs welcome ;)