NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Generate 'check-analysis' graphics #245

Open gothub opened 4 years ago

gothub commented 4 years ago

Update the metadata assessment graph generation facility to create graphs that include information for each check in a suit by incorporating and adapting https://github.com/NCEAS/metadig-dataone-fair/blob/master/FAIR-check-analysis.Rmd.

gothub commented 4 years ago

@mbjones

Currently the information for each dataset quality assessment is included in the Postgresql 'runs' table. A single run entry contains the XML output from the quality engine that includes the results for each check for that run.

run_id, metadata_id, suite_id, timestamp, results, status, error, sequence_id
foreign key: identifiers(metadata_id)
unique constraint: (metadata_id, suite_id)

To enable efficient retrieval of data needed for the 'check analysis' graph, a new table needs to be created that contains the run results:

checks:
run_id, check_id, check_name, check_type, check_level, check_status, check_output
foreign key: runs(run_id)

So, the data for all quality assessments for a suite id, and potentially constrained by data source (MN) could be enabled by adding this table.

gothub commented 3 years ago

Currently, the identifiers table contains metadata_id, data_source.

The other information regarding DataONE pids are currently included in the run output in the runs table, in order to facilitate indexing of the run result document into the Solr index. All fields related to pids should be migrated to the identifiers table, so that the run result XML in the runs table does not have to be retrieved and then parsed to get this information, when retrieving data needed for check analysis processing.

gothub commented 3 years ago

The new configuration for the identifiers table will be:

metadig=> \d identifiers;
                  Table "public.identifiers"
        Column         |           Type           | Modifiers
-----------------------+--------------------------+-----------
 metadata_id           | text                     | not null
 obsoletes             | text                     |
 obsoleted_by          | text                     |
 sequence_id           | text                     |
 format_id             | text                     | not null
 data_source           | text                     | not null
 rights_holder         | text                     | not null
 groups                | text[]                   |
 date_uploaded         | timestamp with time zone | not null
 date_sysmeta_modified | timestamp with time zone | not null
 Indexes:
    "metadata_id_pk" PRIMARY KEY, btree (metadata_id)
Referenced by:
    TABLE "runs" CONSTRAINT "runs_metadata_id_fk" FOREIGN KEY (metadata_id) REFERENCES identifiers(metadata_id)