Closed wgilling closed 3 years ago
The nested loop that calculates the views and items really needs to be optimized so that it does not need to run that many queries on the database. This currently loops through $nids and inside of that it loops through the $nids' $res_type or (# of nodes) x (# of res_type) https://github.com/asulibraries/islandora-repo/blob/develop/web/modules/custom/asu_collection_extras/src/Plugin/Block/AboutThisCollectionBlock.php#L156-L180
Inside of this loop, it is also calling the Matomo service to get each nodes' view count.
Also, this loop is calling getOriginalFileCount a function that itself is looping twice and potentially calling itself recursively (for children of children of the collection). Unless we could say that objects would never be more than two or three layers deep, then we need the recursion.
the three issues are
$files += $this->getOriginalFileCount($child_nid, $original_file_tid);
$this->entityTypeManager->getStorage('node')->load($child_nid);
and
$node_views = $this->islandoraMatomo->getViewsForNode($child_nid);
in the AboutThisCollectionBlock
the recursive part in getOriginalFileCount could be done with the solr query that gets all children using the itm_field_ancestors field such as getCollectionNids? so that it does not need to do recursion at least -- the MySQL could use the returned $nid values in a where clause like node_field_data.nid IN ( {{ the returned $nids imploded with a comma }} ). The other option is to index a flag to indicate there is an original file in Solr so this wouldn't have to be two steps here. (edited) 3:19 realizing that idea of keeping a flag for whether or not there is an Original File on any node has to be an enhancment to tie into the media-related hook or something similar.
Eli Zoller 3:21 PM we could certainly try a solr query and see if its more performant. as far as the original file bit - yeah we'd have to add an indexed value that aggregates up from the media. i don't think thats too much trouble
Willow Gillingham 3:23 PM it will be easy to get the recursion out for now - and very easy to use the solr value for it if / when the media flag could be read in that same query at a later time... until then, I think it will be pretty easy to have one sql statement to get the entire set of nodes' file count at one time.
Design team supported @wgilling idea of using a nightly cache table for the "views" count I found two things that could either make that easier OR possibly make it so that we don't need it at all
Some parameters can optionally accept arrays. For example, the urls parameter of SitesManager.addSite, SitesManager.addSiteAliasUrls, and SitesManager.updateSite allows for an array of urls to be passed. To pass an array add the bracket operators and an index to the parameter name in the get request. So, to call SitesManager.addSite with two urls you would use the following array:
Sometimes it is necessary to call the Matomo API a few times to get the data needed for a report or custom application. When you need to call many API functions simultaneously or if you just don't want to issue a lot of HTTP requests, you may want to consider using a Bulk API Request. This feature allows you to call several API methods with one HTTP request (either a GET or POST).
To issue a bulk request, call the API.getBulkRequest method and pass the API methods & parameters (each request must be URL Encoded) you wish to call in the 'urls' query parameter. For example, to call VisitsSummary.get & VisitorInterest.getNumberOfVisitsPerVisitDuration at the same time, you can use:
https://demo.matomo.org/?module=API&method=API.getBulkRequest&format=json&urls[0]=method%3dVisitsSummary.get%26idSite%3d3%26date%3d2012-03-06%26period%3dday&urls[1]=method%3dVisitorInterest.getNumberOfVisitsPerVisitDuration%26idSite%3d3%26date%3d2012-03-06%26period%3dday Notice that urls[0] is the url-encoded call to VisitsSummary.get by itself and that urls[1] is what you would use to call VisitorInterest.getNumberOfVisitsPerVisitDuration by itself. The &format is specified only once (format=xml and format=json are supported for bulk requests).
I got this from https://developer.matomo.org/api-reference/reporting-api It would require additional methods in the MatomoService: https://github.com/asulibraries/islandora_matomo/blob/master/src/IslandoraMatomoService.php
I looked at the bulk matomo method and the ability to send an array parameter, but I am not sure that these would methods would do much to address our largest collections.
I would like to proceed with a summary table that could fetch the collection stats at night and it could store all of the values that we need for this page... views, downloads, items, files, # of resource types, and even last updated date. These records could also be stored with a date so that we could show "views/downloads/items over time" if we wanted to. Yet, if we want to add a graph like this to a metrics area (with the altmetrics stuff) we would need to store each item and their collection would have to be stored in an indexed table.
The latest commit contains configuration and code for a new Solr processor. This is contained in the asu_search/src/Plugin/search_api/processor/OriginalFileCount.php. Because of this, all of the content will need to be reindexed in Solr before this will work.
The remaining optimization step would be to load the usage values from a summary table that is populated during an off-hours CRON process per collection. The controller code for this block would then only need to load one record from that table for the usage.
Rather than make a pull request yet, the code should be reviewed first. This currently has the non-dependency injection code just sitting in the .module file for now.
Installing the module will create the asu_collection_extras_collection_usage
table. On an existing site, this will require uninstalling first and then installing.
drush pm-uninstall asu_collection_extras
drush pm-enable asu_collection_extras
To populate this table, via drush like: drush asu_collection_extras:collection_summary
. This command will ultimately be set up as a crontab line such as:
1 0 * * * /usr/local/bin/drush asu_collection_extras:collection_summary >/dev/null 2>&1
This appears to be caused by one of the statistics boxes of the AboutThisCollection.php block.
https://prism.lib.asu.edu/collections/177