Structural changes of IEDC database

stefanpauliuk commented 6 years ago

1) in classification_definitions: classification_definition.reserve5: Switch to "CustomFlag": Set True (1) if classification was created from dataset on the fly.

2) in classification_definitions: classification_definition.reserve4: Switch to "BijectiveFlag": Set True (1) if the different attributes provided (if any) have a 1:1 relationship. TRUE for elements, for example (H, Hydrogen, 1 are 1:1:1), FALSE for iso_regions, as its attribute contain continents as well and Both China and Mongolia link to Asia.

3) Switch the following columns to UNIQUE:

classification_definitions.id
classification_definitions.classification_name

stefanpauliuk commented 6 years ago

UNIQUE constraints were added to db definition: https://github.com/IndEcol/IE_data_commons/tree/master/mySQL_Create
Bijective and Custom flag were added to classification_definition, cf. table def. template for changing order of column labels!

nheeren commented 6 years ago

Should I make pull requests for the .xlsx files or just add my suggestions to this issue here? See also my suggestion https://github.com/IndEcol/IE_data_commons/issues/6.

I suggest adding the following UNIQUE constraints: (more constraints added by S.P., are part of .xlsx master files on Sep 15, 2018)

Column constraints (each column(s) UNIQUE):

units table: UNIQUE (unitcode), UNIQUE (unit_name) (column constraints)
users table: UNIQUE (username) (column constraint)
lincences table: UNIQUE (name) (column constraint)
source_type table: UNIQUE (name) (column constraint)
dimensions table: UNIQUE (name) (column constraint)
categories table: UNIQUE (name) (column constraint)
types table: UNIQUE (name), UNIQUE (symbol) (column constraints)
layers table: UNIQUE (name) (column constraint)
aspects table: UNIQUE (aspect), UNIQUE (index_letter) (column constraints)
provenance table: UNIQUE (name) (column constraint)
project table: UNIQUE (project_name) (column constraint)
datagroups table: UNIQUE (datagroup_name) (column constraint)
classification_definitions table: UNIQUE (classification_name), it is not possible to create the same custom classification twice, e.g. origin_process__1_F_steel_SankeyFlows_2008_Global

Table constraints (across column(s): UNIQUE):

classification_items table: UNIQUE (classification_id, attribute_1_oto). That would mean that the combination of classification_id and attribute can only exist once, i.e. no accidental addition of the same classification possible.
datasets table: UNIQUE (dataset_name, dataset_version)

:warning: Instead of adding new comments all the time I will just keep editing this one.

stefanpauliuk commented 6 years ago

I suggest making the following changes to the classification_definitions and classification_items tables (cf. master files):

In the definitions table, add the 'general' column (TRUE if classification is in general use, e.g., chemical elements) and the 'created_from_dataset' column (TRUE if classification is defined by upload script and classification items are populated from dataset. (Full description: see master xlsx file).

In the classification_items table, rename the 'attribute1' to 'attribute4' columns to 'attribute1_oto' to 'attribute4_oto', where 'oto' stands for 'one-to-one', indicating that these four columns are reserved for attributes that form bijective descriptions of the classification items, e.g., chemical element names, atomic numbers, and symbols. Rename the 'attribute5' to 'attribute15' columns to 'attribute5_anc' to 'attribute15_anc' to indicate that these attribute do not need to be 1:1 descriptions of the items but can indicate other relations, such as the aggreation to broader regions or substance groups.

nheeren commented 6 years ago

Looks good. No objections form my side. Should we close the issue for now?

IndEcol / IE_data_commons

Structural changes of IEDC database #4