gbv / coli-ana

API to analyze DDC numbers
https://coli-conc.gbv.de/coli-ana/app/
MIT License
2 stars 0 forks source link

Provide statistics #29

Open nichtich opened 3 years ago

nichtich commented 3 years ago

Can be calculated once and stored on disk (I'd use jq). Numbers such as

stefandesu commented 3 years ago
  • Number of incompletely analyzed DDC numbers
  • Number of distinct incompletely analyzed DDC numbers

What's the difference between these two? The database uses the concept URI as the primary key, so there should be no duplicates.

I'd use jq

How would you use jq for this? The data is currently only in the PostgreSQL database.

nichtich commented 3 years ago

How would you use jq for this? The data is currently only in the PostgreSQL database.

The data is converted to JSON and imported then, so we could also create a JSON dump and work in this. It's just a matter of convenience.

stefandesu commented 3 years ago

The data is converted to JSON and imported then, so we could also create a JSON dump and work in this. It's just a matter of convenience.

I'll check the convert script whether the conversion to JSON works properly at the moment. I haven't tested it since we haven't used it at all.

If it's easier for you to write a jq call that calculates the data than writing a small helper script in JavaScript, then go ahead. I wouldn't even know where to start. 😅

stefandesu commented 3 years ago

Added a small fix in 6f633683a9d11bd439e9c16587f17073e937a377. Now the convert script correctly outputs ndjson when using it without the --import flag.

nichtich commented 3 years ago

There is a statistics script on the dev branch. The server needs to be adjusted to server stats.json (if available) and/or a summary in the interface. Date of last update is also not included yet.