VTUL / vtechworks

DSpace at Virginia Tech
http://vtechworks.lib.vt.edu
Other
6 stars 8 forks source link

Legacy stats page fails to load completely #744

Closed alawvt closed 2 years ago

alawvt commented 3 years ago

The legacy stats page, https://vtechworks.lib.vt.edu/statistics, loads but not completely. That page covers 2019-09-11 to 2021-01-31. The same pages for dev and pprd do load but those pages only cover 2018-03-19 to 2018-06-30.

I logged in the DSpace Demo site as an Administrator and see that this legacy stats page, https://demo.dspace.org/xmlui/statistics, appears the same way, with a long list of all "Items Viewed" (more than 20 times), an extensive list of "All Actions Performed," and an exhaustive list of all "Words Searched." The demo page covers Jan 1, 2021 to Feb 12, 2021.

For VTechWorks, these are huge lists, so that probably explains why that page fails to load on prod. I will look into whether we should/can shorten the time period of this stats overview page.

alawvt commented 3 years ago

Each monthly stats report does seemed to have finished on prod.

reporting period, log processing time, number of log lines
2018-03  141 seconds,  1572289 lines
2018-06  356 seconds,  3708363 lines
2018-09  560 seconds,  4583703 lines
2018-12 1072 seconds,  4945881 lines
2019-03 1509 seconds,  5452977 lines
2019-06  681 seconds,  4292123 lines
2019-09   71 seconds,  4811746 lines
2019-12 1247 seconds. 11256358 lines
2020-03 2793 seconds,  7264045 lines
2020-06 2874 seconds,  8335123 lines
2020-09 3123 seconds,  8099316 lines
2020-10 3457 seconds, 11108579 lines
2020-11 3545 seconds,  9813664 lines
2020-12 4565 seconds, 11829602 lines
2021-01 5503 seconds, 15665601 lines
alawvt commented 3 years ago

We also can adjust variables in https://github.com/VTUL/vtechworks/blob/vt_6_x_dev/dspace/config/dstat.cfg,

# floor values (don't display things that have been activated fewer times
# than this) for the reports
item.floor=20
search.floor=5
# limit the number of lookups of titles and authors to the first X.  Lookup
# invokes the java environment so has quite an impact on performance.
item.lookup=10
# do we want to show email addresses, and if not, how do we represent the user
# data.  We have 3 options: on, alias, off.  Alias distinguishes between
# individual users without disclosing email addresses.  Note: later we may
# support an "id" option, which replaces the address with the db id of the
# eperson account.
user.email=alias
alawvt commented 3 years ago
alawvt commented 3 years ago

I have reported this issue in DSpaceDS-4570: UUID fragments included in Legacy Usage Reports.

alawvt commented 2 years ago

2022-06-06: It seems like the problem with UUID fragrments in the words issue remains. Loading the legacy stats page is much faster now, so that seems resolved. Since DSpace no longer supports legacy stats, I will close this issue.