Open gothub opened 4 years ago
@mbjones
Currently the information for each dataset quality assessment is included in the Postgresql 'runs' table. A single run entry contains the XML output from the quality engine that includes the results for each check for that run.
run_id, metadata_id, suite_id, timestamp, results, status, error, sequence_id
foreign key: identifiers(metadata_id)
unique constraint: (metadata_id, suite_id)
To enable efficient retrieval of data needed for the 'check analysis' graph, a new table needs to be created that contains the run results:
checks:
run_id, check_id, check_name, check_type, check_level, check_status, check_output
foreign key: runs(run_id)
So, the data for all quality assessments for a suite id, and potentially constrained by data source (MN) could be enabled by adding this table.
Currently, the identifiers
table contains metadata_id, data_source
.
The other information regarding DataONE pids are currently included in the run output in the runs
table, in order to facilitate indexing of the run result document into the Solr index. All fields related to pids should be migrated to the identifiers
table, so that the run result XML in the runs
table does not have to be retrieved and then parsed to get this information, when retrieving data needed for check analysis processing.
The new configuration for the identifiers
table will be:
metadig=> \d identifiers;
Table "public.identifiers"
Column | Type | Modifiers
-----------------------+--------------------------+-----------
metadata_id | text | not null
obsoletes | text |
obsoleted_by | text |
sequence_id | text |
format_id | text | not null
data_source | text | not null
rights_holder | text | not null
groups | text[] |
date_uploaded | timestamp with time zone | not null
date_sysmeta_modified | timestamp with time zone | not null
Indexes:
"metadata_id_pk" PRIMARY KEY, btree (metadata_id)
Referenced by:
TABLE "runs" CONSTRAINT "runs_metadata_id_fk" FOREIGN KEY (metadata_id) REFERENCES identifiers(metadata_id)
Update the metadata assessment graph generation facility to create graphs that include information for each check in a suit by incorporating and adapting https://github.com/NCEAS/metadig-dataone-fair/blob/master/FAIR-check-analysis.Rmd.