CCA-Public / scope

SCOPE: An access interface for DIPs from Archivematica
GNU Affero General Public License v3.0
23 stars 4 forks source link

Generate reports on what has been uploaded to the interface #107

Open jraddaoui opened 6 years ago

jraddaoui commented 6 years ago

As the Administrator, I want to generate reports on what has been uploaded to the interface (e.g., the most common file formats in the Collection or a visualization of the last modified dates by archive; this is where Kibana has potential)

jraddaoui commented 6 years ago

As mentioned, Kibana could be used, it may require to add some extra data to the current Elasticsearch indexes or modify existing ones (see #54), depending on what should be reported. Also, a custom dashboard could be developed using D3 charts, like we have in Binder. This estimate my vary depending on that.

jraddaoui commented 5 years ago

Hi @stefanabreitwieser, @bunekcca,

We'd need more information about this feature to be able to provide an estimate:

stefanabreitwieser commented 5 years ago

Hi @jraddaoui !

Kibana had originally been Tim's suggestion. I think it looks really interesting as a tool, but as a non-developer I don't necessarily understand how it would integrate with SCOPE. Would it be a separate stand-alone tool? Are there any pros/cons to not using Kibana? Do you have any strong preferences or opinions about either option?

We have two big categories of reporting that we're interested in: reference statistics and collection statistics. Reference statistics should show how the material is being used by researchers. Some sample questions:

Collection statistics should reflect what the entire collection as a whole actually is (i.e. a broad look at everything that's been uploaded). Some questions:

Charts and tables that can be exported as CSVs would be ideal.

Being able to change the parameters would be ideal and it looks possible using Kibana, but this is negotiable.

This is a long list of things! I have to say that for this feature, I'm less sure of what's possible and reasonable to do given our budget and timeline, meaning that if you'd like to set up a call with just the two of us to discuss the best way of moving forward with this, I'm happy to do so.

jraddaoui commented 5 years ago

Hi @stefanabreitwieser, that's great!

Thanks for the quick response. I'm out on vacations for a few days but I'll follow-up as soon as I get back on the 17th. Sorry for the inconvenience.

jraddaoui commented 5 years ago

Hi again @stefanabreitwieser,

As you mention, Kibana would be a different application, with its own authentication method, etc. I just sent an email with the credentials for a test instance and a Kibana server to check what this tool can provide. If you're planing to have this reports and visualizations only for admins I'd definitely recommend to use Kibana. However, if you intend to give access to this section to all the SCOPE users, it would be better to integrate it into the application.

On one hand, Kibana won't require major development, just improving the current indexes to allow the statistics you want, but it will require some knowledge of Elasticsearch to create the charts/reports and to format the data. It also has a lot of features that you probably won't use and some of them (like a proper authentication system) require to purchase an Elastic license.

On the other hand, developing a reports section with all the requirements you mention will take quite some time but it will give you more control over the content without having to know about Elasticsearch aggregations.


About the two categories of reports:

https://kibana.ccarch.artefactual.com/app/kibana#/dashboard/23b53430-1b12-11e9-9314-9f7362acbb75

By clicking in the three dots on the top-right corner of the graph you will see the chart in a table format with the option to export it as a CSV file.


With all that being said and considering the phase 2 budget and other issues, I personally think we should go with Kibana for now. If we have the time, we can try to find a way to proxy or iframe the Kibana reports in the SCOPE application.

Best regards.

stefanabreitwieser commented 5 years ago

Thanks so much Radda! Reports will be for admins only, so Kibana should be no problem in that respect. I'll take a look at the link once we fix the timeout issue. (I sent an email with more detail.)

Before we go any further with this ticket, would you mind giving us a time estimate? We did flag Kibana reporting for this phase, but it's a lower priority compared to other things. Let's make sure we have room enough in the budget before doing additional work here. Thank you!

jraddaoui commented 5 years ago

Hi @stefanabreitwieser,

The timeout issue should be fixed for you now, I've changed the URL in my previous update accordingly.

For an estimate, if we're only going to use Kibana, it depends how much do you want to do in there and how much guide will be needed from us. As an external tool, we won't be able to develop any requirements you may have for it, but we could guide you on how to create the charts and reports you need. It will also require to add/format some data into the Elasticsearch indexes and to create some documentation to setup and connect both instances. So far, I have spent around 8 hours setting-up, configuring and securing the Kibana instance and creating the first chart, but the final estimate will vary depending on what reports are needed and if we're going to use only Kibana and only for the collection statistics.

jraddaoui commented 5 years ago

Added a "Digital files per year" chart and fixed the existing counts in the dashboard:

https://kibana.ccarch.artefactual.com/app/kibana#/dashboard/23b53430-1b12-11e9-9314-9f7362acbb75