Open pinarpink opened 5 years ago
After reviewing available options from https://djangopackages.org/grids/g/reporting/, I would say two options are relatively good (still mantained, claiming to be production ready):
Use Django SQL Explorer (https://github.com/groveco/django-sql-explorer) a. Add it to the requirements in setup.py, and settings.py b. Restrict access, so only superuser could access the SQL panel (it'd give very powerful access to data discovery) c. Add another user to the postgres database, it must not have write access d. Construct some example reporting queries and save them for later reuse
Use Django Excel Response (https://github.com/tarkatronic/django-excel-response)
a. Add it to the requirements in setup.py, and settings.py
b. Create some example reporting views, that would return Django's ORM results (like: return Project.objects.filter(something)
)
c. Wrap the views in ExcelResponse
d. Add a simple page available to superusers, that would offer downloading excel/csv reports.
Similar to option no. 2 - generate PDF reports ourselves
I could prepare PoC of either option (or all of them); what do you think?
Hi @jLebioda I prefer to go with option 1. We already have a reporter user (read-only) on the DB.
@neoflex @vildead what are your thoughts?
The first option looks really cool. What is the expected administration/coding overhead? The second option seems to be more light.
I think both options should be pretty straightforward and light; I'll prepare a PoC then
Here are some initial SQL commands: Projects:
WITH
/* get projects */
projects
AS (SELECT id project_id,
acronym,
has_cner,
cner_notes,
has_erp,
erp_notes,
umbrella_project_id
FROM PUBLIC.core_project),
/*get all funding sources as one value per project*/
funding_sources
AS (SELECT project_id,
String_agg(NAME, '; ') funding
FROM PUBLIC.core_fundingsource a
RIGHT OUTER JOIN PUBLIC.core_project_funding_sources b
ON b.fundingsource_id = a.id
GROUP BY project_id),
/*compute number of datasets for each project*/
n_datasets
AS (SELECT project_id,
count(*) number_of_datasets
FROM core_dataset
WHERE project_id IS NOT NULL
GROUP BY project_id),
umbrella_projects
as (SELECT id project_id,
acronym umbrella_project_acronym
FROM PUBLIC.core_project),
/*documents */
ethics_approvals
AS (SELECT DISTINCT
object_id project_id,
't' has_ethics_approval_document
FROM core_document cd
INNER JOIN django_content_type ct
ON cd.content_type_id = ct.id
WHERE domain_type = 'ethics_approval' AND model = 'project'),
/* local custodians*/
custodians
AS (
SELECT
project_id,
string_agg(username, '; ') local_custodians
FROM
core_project_local_custodians pc
INNER JOIN core_user u
ON pc.user_id = u.id
GROUP BY project_id
)
SELECT a.*,
COALESCE(ea.has_ethics_approval_document, 'f') has_ethics_approval_document,
cu.local_custodians,
up.umbrella_project_acronym,
c.funding,
COALESCE(d.number_of_datasets, 0) number_of_datasets
FROM projects a
LEFT JOIN n_datasets d
ON a.project_id = d.project_id
LEFT JOIN funding_sources c
ON a.project_id = c.project_id
LEFT JOIN umbrella_projects up
ON a.umbrella_project_id = up.project_id
LEFT JOIN ethics_approvals ea
ON a.project_id = ea.project_id
LEFT JOIN custodians cu
ON a.project_id = cu.project_id
Datasets
WITH datasets
AS (SELECT a.id AS dataset_id,
title,
project_id,
b.name sensitivity
FROM core_dataset a
LEFT JOIN core_sensitivityclass b
ON b.id = a.sensitivity_id),
/* number of data declarations */
n_declarations
AS (SELECT dataset_id,
Count(*) number_of_datadeclarations
FROM core_datadeclaration
GROUP BY dataset_id),
/* number of shares*/
n_shares
AS (SELECT dataset_id,
Count(*) number_of_shares
FROM core_share
GROUP BY dataset_id),
/* number of storages */
n_locations
AS (SELECT dataset_id,
Count(*) number_of_locations
FROM core_datalocation
GROUP BY dataset_id)
SELECT a.title,
a.sensitivity,
COALESCE(b.number_of_datadeclarations, 0) number_of_datadeclarations,
COALESCE(c.number_of_shares, 0) number_of_shares,
COALESCE(d.number_of_locations, 0) number_of_locations,
a.dataset_id,
a.project_id
FROM datasets a
LEFT JOIN n_declarations b
ON a.dataset_id = b.dataset_id
LEFT JOIN n_shares c
ON a.dataset_id = c.dataset_id
LEFT JOIN n_locations d
ON a.dataset_id = d.dataset_id
Data declarations
/*data declarations*/
WITH declarations
AS (SELECT
dd.title,
da.title as dataset_title,
dd.comments,
dd.end_of_storage_duration,
dd.storage_duration_criteria,
dd.has_special_subjects,
dd.id as datadeclaration_id,
dd.dataset_id,
dd.contract_id,
dd.partner_id
FROM core_datadeclaration dd
INNER JOIN core_dataset da ON dd.dataset_id = da.id
),
/*number of locations*/
n_locations
AS (SELECT datadeclaration_id,
COUNT(datalocation_id) number_of_locations
FROM core_datalocation_data_declarations
GROUP BY datadeclaration_id)
SELECT a.*,
COALESCE(b.number_of_locations,0) number_of_locations
FROM declarations a LEFT JOIN n_locations b USING (datadeclaration_id)
How two first queries from the snippet above look like in the SQL explorer:
Django-excel-response works flawlessly, but choosing the data for export must be done in Python - e.g. to report the projects without ethics approval:
def projects_without_ethics_approval(request):
objs = Project.objects.filter(has_erp=False)
return ExcelResponse(objs)
for exporting the data that are more complex than objects from a model, custom list/dict can be created
^this comes from dummy data
Partially addressed in #163
Here I will only add django-sql-explorer
Dear @jLebioda , is the Django SQL Explorer feature ready? The exports looks great!
We might create a patch/minor to today's release :)
Dear @vildead the preview of the feature is available on a separate branch https://github.com/elixir-luxembourg/daisy/tree/99-add-reporting-sql-exporer It was done some time ago, thus it might need a rebase
@jLebioda this is currently only available in debug mode. Can we discuss in the next DAISY regular meeting whether we can move this to production, then we will develop a few reports.
@jLebioda this is currently only available in debug mode. Can we discuss in the next DAISY regular meeting whether we can move this to production, then we will develop a few reports.
@pinarpink Yes, naturally!
Is your feature request related to a problem? Please describe.
Users of DAISY, data stewards, DPOs need to be able to create reports. E.g.
(1) The set of data that do not have a storage end date set and the storage criteria field is also empty. (2) All data that have been transferred to Partner A, or partners out of the EU
Describe the solution requested
We need a reporting solution, maybe initially suited to power users, who would build the query themselves .
Describe possible alternatives
Pick from below based on prior experience https://djangopackages.org/grids/g/reporting/
Other info context