Lund-University-Biodiversity-data / datahost-api-client

0 stars 0 forks source link

Statistics ? #15

Open mathieuLU opened 1 year ago

mathieuLU commented 1 year ago

Check if some statistics are required. Client side ? server side ? homemade storing ? Google Analytics like ?

I guess it would be good to have let's say the totals per month stored somewhere.
I think the number of downloads is the critical number. I'm not sure that they thought things through properly when they wrote that we should report number of searches. I mean, I'm not sure how useful that information is to them. But we should post the question in the API-group and let Tommy/Hanna take it further up in NV to make sure we don't add a feature that is worthless.
anneliejonsson commented 1 year ago

Statistics for searches: All I remember is that NV want this statistic. So a total number of searches made via the client. If possible broken up into the three types of information (Artobservationer, Inventeringstillföllen, Metadata), but that might be overkill at the moment. Especially with the constraints on your time :-/

Statistics for downloads: The minimum required would be a total number of downloads per month: one record for each of the full archives (csv vs xlsx) and one record for xlsx downloads after filtering and one record for csv downloads after filtering. How does that sound?

Even better would be to have separate statistics for the three different types of information ("Typ av information" => Artobservationer, Inventeringstillfällen, Metadata), downloaded as eoithe xlsx or csv), but just like above, that might be overkill for now.

mathieuLU commented 1 year ago

the classic statistics tools will deal with urls. SO they will be able to check how many times this or that url is called. But hey won't be able to deal with a form (how many times this element was selected).

But actually there is something I can easily do : right now a file is generated every time a csv or xlsx is created with the client. Which means that i can easily count how many files were created.

But it won't tell how many times the public archives are downloaded.

Maybe we could target a combo of

But it won't count or instance :

mathieuLU commented 1 year ago

the best alternative to GA would be Matomo (i used Piwik in the past, its old name) https://matomo.org/faq/on-premise/installing-matomo/

but it would take a bit of time to set it up on canmove-app

it's free as well. We own the data. Nothing is sent to google. I asked Manash as well to check what they are using

mathieuLU commented 1 year ago

Yes SBDI is using matomo. Could worth it to run an instance on canmove-app.

All the websites running on it could use it as well. @blacksparrowhawk any opinion ?

anneliejonsson commented 1 year ago

Do you have an estimate of the time neeed to set up one of the mentioned tools to count downloads via the urls (i.e. the full public archives at the top of the client)? In any case, one of those tools for counting that part sounds fine. I trust you in deciding which one to use.

About statistics for searches and downloads using the filters in the client, i.e. two separate statistics... Please, first confirm if I remember correctly: the csv- and the xlsx-files are created already when the user clicks "Sök och visa resultat". Or? And let's say the user wants to download the xlsx-file, is the xlsx-file then created again when the user clicks "Ladda ner xlsx"? If the above is true, then three files would have been created. This scenario means three files are created for every file that is downloaded. However, if the user perhaps changes his search twice (i.e. does three searches) before he downloads, then seven files would have been created, but only one downloaded. If this is correct, is there then a way to distinguish between the files created when clicking "Sök och visa resultat" from the files created when clicking "Ladda ner ..."?

mathieuLU commented 1 year ago

the csv/xlsx files are created only once, when the user clicks on "Ladda ner xlsx" I think you mix up with the requests to the API server (one request every time you click on "Sök och visa resultat" or "Ladda ner xlsx")

anneliejonsson commented 1 year ago

Ah! So I did.

Ok, so we have ways of counting all downloads - e.g. Matomo for counting the full archive downloads at the top of the client, and your "internal counter" for files downloaded after the user filters out some data. Very good!

All that's left now is the number of searches. This is the thing I asked about at the API-meeting, whether it was necessary. Finn checked with the rest of NV and it was decided that they do want that statistic as well. Can you see a(n easy) way of counting searches, i.e. the number of clicks on "Sök och visa resultat"?

mathieuLU commented 1 year ago

Manash allowed me to create an account on their Matomo instance, so I could already set up a tracker for our PROD API client.

This tracker will give us statistics about the client only. Please visit the website, in order to add some statistics and see what it's able to count canmove-app.ekol.lu.se:8089/ I'm not sure yet it will be able to count correctly the number of downloads of the public archives. We'll see. Please try ;-)

Then on the server side, no tool to count. So we won't know how many times the API is used for instance.

As said, the only thing I can do is to count (from the client side) how many xlsx files were generated.

mathieuLU commented 1 year ago

The counting on the server side is ready

every time an endpoint is hit, a row is created in the database. In the database, it loos like that : Screenshot from 2023-02-09 16-55-35

i store the date, the endpoint name (so we know if it's GET or POST, and if it's records/events/occurrences) It will be easy to get any figure from it. How many from each, filter with dates, etc...

mathieuLU commented 1 year ago

exactly what we wanted ! Screenshot from 2023-02-10 10-10-10

mathieuLU commented 1 year ago

and now we have as well some statistic on the form. To store if it's html/csv/xlsx. And which object (Occurrence/Event/Dataset) Screenshot from 2023-02-10 10-22-24

From my pov, we're good regarding the statistics !

anneliejonsson commented 1 year ago

Ok. So your reply that starts with the sentence "The coounting on the server side is ready", and has the first screenshot of code covers the statistics when people are using the API directly without the client. Correct?

And in the following reply "exactly what we wanted!" with a screenshot "Downloads" covers the statistics for downloads of the full archives from the client.

And the third reply "and now we have..." gives us statistics for three things: html => the number of times an html table is created (i.e. the number of times "Sök och visa resultat" has been clicked csv => the number of csv files downloaded after a search xlsx => he number of xlsx files downloaded after a search And all these three things can also be "split" into the three types of information.

Excellent work!

anneliejonsson commented 1 year ago

And what do you need from me to make sure my visits to the client and my searches and downloads aren't added to the counts?

mathieuLU commented 1 year ago

We have to remember to reset the stats counter and add our IP in the filter of the statistic module

mathieuLU commented 1 year ago

And i changed the url in the Matomo tool. Since it's based on the URL, I had to update with the new url ! http://biodiv-app.biol.lu.se/naturdatavardskap-sok-data/

anneliejonsson commented 1 year ago

My IP-number here in Lund: image