DSpace / dspace-angular

DSpace User Interface built on Angular.io
https://wiki.lyrasis.org/display/DSDOC8x/
BSD 3-Clause "New" or "Revised" License
126 stars 416 forks source link

Let the admin export the statistics using the angular frontend #1009

Closed kanasznagyzoltan closed 1 year ago

kanasznagyzoltan commented 3 years ago

At the moment the you can only export the usage statistics using the DSpace command line tool.

This issue is purely a feature request, which could enable the admin user to export & download the usage statistics using the angular frontend. The angular frontend already has a submenu which can be extended for this purpose. I attached a screenshot how we could do the feature addition:

image

My colleague @dsipos-dev could do the implementation. We have tried to estimate the required work to do this. As far as we know mostly it will require backend implementation. The frontend UI is rendered using the backend configuration.

tdonohue commented 3 years ago

@kanasznagyzoltan : I'm not sure I understand the use case for this.

The solr-export-statistics command is a complete dump of all statistical information in your Solr statistics index, and based on the number of records in that index, it may create several dump files (looking at SolrImportExport it appears it creates a new file for every 10,000 rows).

So, I'm worried moving this to the UI would be harder than it seems....when the process starts, you don't know how many files will be included in the export, nor do you know the size of the export (It will be large, as you are basically creating a full copy of all the data in your statistics index). In my opinion, that makes this command risky to make available in the UI as it currently works. For instance, it'd be bad if the Admin triggers this command from the UI, only to find it exports so much content that the server runs out of storage space. Whereas, if you are on the commandline, you can determine how much storage space you currently have available (and how much the statistics core seems to be using) before proceeding.

All in all, I'm worried this command won't be simple to move to the UI. It'd need reworking so that there were protections in place to ensure you aren't using all your disk space, and to make sure the resulting export is downloadable via the web (as the export may consist of a number of large files). Those are my initial thoughts here, but I'd welcome other developers to add their opinions (especially if I'm overlooking something)

kanasznagyzoltan commented 3 years ago

@dsipos-dev Could you write some info what and how we have done this? Thank you!

mwoodiupui commented 3 years ago
  1. We need to speak carefully: the statistics core contains cases not statistics.
  2. I also am wary of running at arms' length a process that produces this much output.
  3. How does the Processes subsystem deal with a process instance that yields multiple outputs, perhaps hundreds?
  4. The reporting directory can be on a separate mounted filesystem, which would confine out-of-storage problems to the Processes reporting (for all processes!) and leave e.g. the Assetstore unaffected.
  5. I don't see why one would do this GUI-style, rather than just SSH to the server and run bin/dspace whatever there. Working locally to the server, one can do things like mount the ultimate destination of the data as a network share and send the report directly to where it is wanted, without consuming server storage. Hiding this behind the GUI is very limiting.

What is the use case here?

tdonohue commented 3 years ago

Flagging as needs discussion and moving into our unscheduled backlog, as the use cases are still unclear here.

tdonohue commented 1 year ago

Closing, outdated and no use case provided

mwoodiupui commented 1 year ago

Some more thoughts:

You can get the usage records directly from Solr. You can select just the records that you want using Solr's query language.

Not everyone should have access to these data. You can secure access to the Solr port independently of access to DSpace, and you can create accounts in Solr independently of DSpace accounts. You can (and arguably should) run Solr on a different host altogether, behind a corporate firewall that makes it invisible to The World.

Solr already provides a way ("cursormark") to deal efficiently with huge quantities of records, which is exactly what usage recording produces. No temporary files are involved, so there is no reason for concern about server storage.