Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Use case: Counting and exposing usage stats for multisites #1073

Open bondjimbond opened 5 years ago

bondjimbond commented 5 years ago

Related: #792

| Title (Goal) | Aggregate stats for all uses of a given object regardless of which site they came from | | Primary Actor | Repository admin | | Scope | Architecture | | Level | High | | Story | As a repository admin, I want my objects' usage stats to be counted whether they were recorded via the parent site or a child site. I want to be able to expose this data via modules like islandora_usage_stats_charts and Views, counting aggregated views and downloads rather than site-specific ones. |

bondjimbond commented 5 years ago

(OK, I don't know how to make tables in markdown.)

dannylamb commented 5 years ago

MD table woes aside, thx for making this use case @bondjimbond

bryjbrown commented 5 years ago

Cross-referencing here to point to a post I made earlier today in https://github.com/Islandora-CLAW/CLAW/issues/792, if we can send a truly unique ID (unique across all the sites you wish to capture metrics from at least) to Matomo as a custom variable whenever it phones home, then we can target that unique ID when creating reports that aggregate statstics across multiple sites.

Conversely, if you have the "same" object on two or more sites, you could reuse that ID, or use some other ID that represents the concept of the object and not a specific instance of it. Then when you request the data for that ID, Matomo would aggregate the usage of it coming from any site reporting that ID.

bondjimbond commented 5 years ago

if we can send a truly unique ID (unique across all the sites you wish to capture metrics from at least) to Matomo as a custom variable whenever it phones home, then we can target that unique ID when creating reports that aggregate statstics across multiple sites.

@bryjbrown Does Islandora 8 still use PIDs (in the same namespace:identifier format we use in Islandora 7)? If so, then we've got that built in.

Conversely, if you have the "same" object on two or more sites, you could reuse that ID, or use some other ID that represents the concept of the object and not a specific instance of it. Then when you request the data for that ID, Matomo would aggregate the usage of it coming from any site reporting that ID.

Nice.

How much of this custom Matomo configuration happens in Matomo, vs in Islandora module-building? It's really nice if the whole thing can be plug-and-play.

DiegoPino commented 5 years ago

@bryjbrown i'm a bit confused by this. On one side i hear the need to integrate as much into the D8 ecosystem, on the other side i see handling DO (basically nodes and media) generated by I8 as different things? So what rules?

Seems to me that it would just add overhead. Variables are going to be deprecated in the next release (should stay as a marketplace plugin around) and dimensions is what we have now. They both have issues. Variables are complicated to segment and widgetize (not even possible as individual things), Dimensions accept a single value per page view per slot. PIDs are not longer a thing here, they will get moved on (i suppose) if you migrate from 7.x but they are not a primary source of anything if you start from scratch or you add objects (nodes) after the migration. Natural way would be to use the node UUID as "PID" (@rosiel knows i would recommend that), since its the only thing that is unique in a D8 instance and in that case with the contributed matomo module you could use the node token for that.

It seems to me its easier to use the Drupal Matomo module if variables are the chosen and in that case and if you use what we have here (not islandora), a canonical with a /node:uuid url becomes a candidate for an automatic extraction without having to even modify code on D8.

Conversely, if you have the "same" object on two or more sites

Not such thing exists/can exist out of the box, neither on D8 nor Matomo. Matomo tracks web URLs (yes..you can fake one page to report the URL as the one from another but then you have to make it for a whole repo). And you can't have the same Node in two sites because of that. Not saying you could hack your way around, you could by making both sites report to the same matomo site id and transforming one sites URLS to be the other one, but then you looose the ability to track single sites (and then again you can hack your way around)

rosiel commented 5 years ago

Related issue #396, especially @DiegoPino's comment on it: https://github.com/Islandora-CLAW/CLAW/issues/396#issuecomment-416260098

This here issue is assuming a structure where

  1. a Drupal multisite allows multiple Drupals to talk to the same Fedora
  2. objects can in some cases appear in more than one Drupal (therefore usage may be split across sites)

but 2 isn't happening, and might be its own issue ... ( #245 is related but only mentions in passing whether a single object could be present in multiple drupals.)

bondjimbond commented 5 years ago
  1. objects can in some cases appear in more than one Drupal (therefore usage may be split across sites) ... but 2 isn't happening, and might be its own issue ...

What do you mean by that, @rosiel? Do you mean that Islandora 8 is not being built with that capability, or do you mean people aren't doing it?

This is exactly the use case for Arca; all objects being present in at least two separate Drupal sites is key to how we work.

whikloj commented 5 years ago

@bondjimbond I think the issue is that in Islandora 7 Fedora was the source for all objects. So you could have multiple front-ends touching the same backend and just had to decide which ones you wanted to display to a specific site (in a multi-site context).

In Islandora 8, the Drupal database is the source (Fedora is your preservation store) so you will need to generate some external datastore to hold your objects and their field data and then some method of accessing or synchronizing this content from all multisites.

Also as each site might have "some" different content, then how does node/4 work on site1.arca.ca versus on site2.arca.ca, what if one site already has that node?

Maybe you need to have a single data-entry interface, with items restricted to specific users and then multiple front-ends built off that single Drupal back-end. (just a thought)

I'm not that it can't be done, but it will require some real thinking and development.

rosiel commented 5 years ago

@whikloj @bondjimbond This blog post has some interesting perspectives, suggestions, and clues. https://dri.es/how-to-decouple-drupal-in-2019 - essentially there's a possibility for Drupal to be the content backend for one or more static sites. That might be the most secure, and "deployment" of new content could happen "all at once". But you'd need someone who knows a bit about drupal and other front-end frameworks.

bondjimbond commented 5 years ago

Also as each site might have "some" different content, then how does node/4 work on site1.arca.ca versus on site2.arca.ca, what if one site already has that node?

@whikloj Yeah, that's what works so nice in Islandora 7... because every object's URL points to its Feodra PID rather than a Drupal node, there is no risk of duplication. Would it be plausible to do something similar in Islandora 8, where each object has a unique permanent URL? (I'd say UUID, but those are pretty unwieldy.)

whikloj commented 5 years ago

@bondjimbond not really, you'd also have to deal with how each site stores its content. Currently you have one database (or table prefix) for each drupal site. To have multiple Drupal share a database (IMHO) you'd have to rewrite the core of Drupal.

Another possible option is some sort of object synchronization, where certain objects could be synced to your other sites. That would require some work in calling out to get the content from the other site and then who gets to control how it looks/what the data is.

This is why I say if you want a single datastore with multiple front-ends your best bet is to build a single Drupal back-end and invest in multiple front-ends that use your Drupal as their datastore.

Perhaps, a single site that publishes static websites for all your partner institutions. Add a new object to site 1 and a new page is pushed out for that site.

bondjimbond commented 5 years ago

Perhaps, a single site that publishes static websites for all your partner institutions. Add a new object to site 1 and a new page is pushed out for that site.

Hmm, that could work, if Solr could be usefully deployed on all of the static-ish sites. Then you get into the tricker area of user management and restricting users' management rights to only certain sets of objects, which would have to be automated somehow... (e.g. can user management rights be assigned based on taxonomies or some other factors to help differentiate objects?)

bryjbrown commented 5 years ago

Sorry for the lag in reply, I've been on vacation and out of the office for the last 2 weeks. A lot of the discussion here is going into Drupal 8 multisite discussion territory, which is important, but this is an issue specifically about doing stats in a multisite environment which IMHO is a much simpler issue. I'll try to reiterate what I said earlier and phrase it in a different way that hopefully makes more sense.

I'm proposing that we use Matomo's custom variables feature to send an ID that represents the node accessed as a custom variable to Matomo. Matomo can have up to five custom variables and this gives you a lot of power in slicing the data in Matomo in different ways. @DiegoPino has mentioned that Matomo is deprecating custom variables in favor of custom dimensions (which I am not familiar with) but at least for the time being I'm going to focus on custom variables because thats what the Drupal 8 Matomo module allows you to configure, it doesn't currently support custom dimensions AFAICT. Since there can be multiple URLs used to access the same Drupal 8 node, having some sort of ID to request stats by would be good because then you could get the stats by node and not by URL.

@DiegoPino has also mentioned that we can set Drupal 8 to only send the canonical URL (eg, base.url/node/<NID>) when phoning home, and then you could use regex to lop off the final integer and get the NID as part of the page URL and pivot on that instead of needing to send a custom variable, but I disagree with this approach for two reasons; first, NID isn't a good ID to use if you are running multiple Islandoras for reasons that I'll explain in the next paragraph, and second, it makes it so you can ONLY look up stats by NID and takes away the option of seeing the stats on the different URLs people used to access the same node. I could easily see someone setting up base.url/digital_library/thing1 and base.url/institutional_repository/thing1 as aliases that point to the same object, but perhaps theres a context set up that uses a different template based on the URL used to access the node. I would like to be able to see the breakdown of requests to one URL vs the other, as opposed to all requests to thing1 being treated the same.

So, if we are agreed that we need to pass some sort of ID to Matomo to represent the content itself as opposed to its URL, NID works just fine if you are running a single Islandora instance tied to a single Matomo, but loses utility when you are trying to run a multisite Islandora tied to the same Matomo because site.1/node/6 and site.2/node/6 will almost certainly represent different things. UUID is another option, but it has the opposite problem. Suppose you want to recreate site.1/node/6 on site.2, the automatically generated UUID is going to be different each time (unless you feed it site.1's UUID as part of ingesting it into site.2, which may or may not be a best practice, I honestly don't know enough about UUIDs to say). I think the better option would be to use some sort ID that represents the content and not the node, something like a DOI for instance. You know when you see the same DOI on two different URLs that they are representing the same piece of content in the abstract. People who are running multiple Islandoras and intent on representing the same content across multiple sites will need to decide on some sort of ID that is unique to the content, but not necessarily unique across sites, as then this ID could be sent to Matomo and you could request stats via this ID and have Matomo return you the aggregate across ALL sites it appears on, solving the problem this issue is set up to explore. I originally mentioned PID as an option because PID is one of the fields on Islandora 8.x objects out of the box using https://github.com/Islandora-CLAW/islandora_defaults, and PID doesn't necessarily have to represent the namespace:id format that Fedora 3 used (even though you could do that too since its technically an ID representing the same content living at different URLs in Islandora 7.x and Islandora 8.x if you have both up simultaneously during a migration). A PID is any persistent ID; technically a DOI is a type of PID. Administrators for multi-site set ups can choose their own local PIDs that can tie together the same piece of content on different sites, send that PID to Matomo as a custom variable, and then request data from Matomo by PID to get an aggregated count.