elan-ev / tobira

Video portal for Opencast
https://elan-ev.github.io/tobira/
Apache License 2.0
24 stars 18 forks source link

Statistics in Tobira #1155

Open LukasKalbertodt opened 8 months ago

LukasKalbertodt commented 8 months ago

Users of Tobira should be able to see some statistics about their events, pages, ... in Tobira. The most basic one would be a video "click" counter. This is a very general issue describing all the ways one could approach that.

With #1099, it is possible to let the Paella player send usage statistics to a Matomo server. As described in #1038, this is not the full solution, as that data is only stored in Matamo, not displayed in Tobira itself.

Where to store the statistical data?

Matomo

The collection would work as it does with the Paella plugin now: matomo.js would be loaded from the Matomo server, and events would be emitted, which are sent to the same server.

Advantages:

Disadvantages:

Tobira

Tobira would just use its own API to let the frontend send certain events and statistics.

Advantages:

Disadvantages:

Opencast

Opencast could get APIs to send events to (similar to Matomo). And/or Opencast could just check incoming HTTP requests for certain files, and trace them back to an event.

Advantages:

Disadvantages:


As you can see, this is not a simple choice.

I personally tend towards not relying on Matomo to not make the architecture more involved. But I can totally understand the requirement to also show video views that happened in an LMS. And I'm not entirely sure how difficult implementing all of this would be, especially the protection against bad actors.

A mix of these options is certainly possible. For example, we could task Opencast with only gathering basic statistics about videos, while Tobira collects most other data itself.

oas777 commented 8 months ago

In light of resources being scarce this year, I suggest to keep it simple:

Questions to change my mind:

If anyone wants better statistics from Opencast and/or Tobira in Opencast and/or Tobira, they can revisit this as an Opencast/Tobira feature next year.

LukasKalbertodt commented 8 months ago

How big is the difference between Tobira and Matomo in gathering data?

It's really quite difficult to get good numbers on that. For somewhat obvious reasons: browsers blocking Matomo tracking, are hard to track. I have the following two plugins installed in all my desktop browsers I regularly use:

Both of these seem to block Matomo in their default configuration. In uBlock Origin's case, the "Easy Privacy" list seems to be responsible for blocking, which is also used by other ad blockers, I believe. Of course these absolute numbers about "users" above are not very helpful. It's also important to understand that the typical users of a video portal are not the "average population" and might be more inclined to install such a privacy or ad blocking plugin. There are also some Chrome-based browsers like "Brave" getting fairly popular. Many of these also promise enhanced privacy and I saw reports about at least "Brave" of blocking Matomo in some cases.

But even with this research, it's hard to put numbers on it. If I were asked to take a guess, I would say that between 5% and 50% of your ETH video portal users would block Matomo. That's quite the range, I know :P If I were pressed to guess one number... maybe 15%?

And again, some users (e.g. computer science students) are much more likely to have such a blocker installed. So all your computer science lectures might have significantly fewer views if using Matomo for that :grimacing:


Regarding your suggestion: I don't mind if Tobira starts collecting very basic statistics. Then we will probably start doing that some time soon.

oas777 commented 8 months ago

Thanks, Lukas. So if my video was clicked 100 times, Matomo reports 5-50 clicks, right? What would Tobira report?

LukasKalbertodt commented 8 months ago

No, Matomo would report 50-95 clicks. My percentages talk about the probability of Matomo being blocked, i.e. unable to report anything. Tobira would report all 100.

oas777 commented 7 months ago

Tobira would report all 100.

That's tempting. Anyway, let's wait for David and others to comment my suggestions.

dagraf commented 7 months ago

Here my comments:

Questions about our solutions:

Bildschirmfoto 2024-04-12 um 08 40 47 Bildschirmfoto 2024-04-12 um 09 20 06
oas777 commented 7 months ago

Pending Sascha's comments and in order to limit this discussion to "Statistics in Tobira" I would suggest

snoesberger commented 7 months ago

Finally, here you have my comments about different topics in this conversation:

  • How big is the difference between Tobira and Matomo in gathering data?

As Lukas already mentioned, AdBlockers are a big problem for Matomo. But there are ways to avoid being blocked by them, see f. ex. https://github.com/0x11DFE/Matomo-Anti-Adblock. With these settings I was able to bypass AdBlockers like uBlock or Ad Blocker Ultimate. But at the moment this works only for Paella 6, in Paella 7 there is no way to change the name of the Matomo JavaScript file which has to be loaded by the player (Paella GitHub issue).

It is difficult to get real numbers on how many views or clicks you are missing with Matomo. Most of the blocking happens on the client side and you will never know from your (server) perspective when statistics were blocked. One way to get an idea is to compare the access to the video files in the server logs with the data in Matomo. With our real live data I did compare our Matomo unique visitors with our nginx access logs. The user IP and the user agent string are used to recognise a unique visitor in the nginx access log file. This are the results:

The AdBlocker bypass as described above was in place for this analysis.

  • How reliable is the Bern solution to feed Matomo data to Opencast? Is there any data beyond "clicks" being used?

At the moment we only provide "clicks" in Opencast. To do this, we copy data from Matomo to the InfluxDB, which is needed by the Opencast statistics feature. The copy script uses the "segment" parameter of the Matomo API to get the hourly data. This can lead to problems if the API with the "segment" parameter is called several times in a row. To avoid these problems we copy the data just once in an hour.

  • Can you share statistics from Matomo with owners of a video / a series?

In the Matomo UI you can't restrict access to the statistic data for just the owned videos or series. A logged in user in Matomo has always access to the statistics of all the videos and series.

Conclusion

oas777 commented 7 months ago

@oliverkarlETH

oas777 commented 7 months ago

Thanks Sascha for your explanations. I think we have agreement on what we want in Tobira for the time being. Let's discuss everything else somewhere else.

LukasKalbertodt commented 7 months ago

Thanks @snoesberger for your data. I just saw your talk as well. Very very useful information! I am also happy to see that my estimates weren't that off. What surprised me is that "The AdBlocker bypass as described above was in place for this analysis." So even with this workaround, Matomo missed around 25% of users. That's very interesting. So overall I guess the number of users blocking Matomo is non-insignificant and certainly not something one can ignore easily.

While I do think the "Tobira only counts views in Tobira itself" limitation is quite a major one, I agree with both of you that we should start with that. Even if we eventually move this into Opencast (to count all views), development and experimentation with this in Tobira is likely faster. Once we have something in Tobira that we are decently happy with, one can still try to move it into Opencast.

I'm very interested in tackling this, but as you know, some other things have a higher priority right now. We will see when I'll get to it.