fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

measurementID - all manadatory fields? #16

Closed caterinap closed 7 years ago

caterinap commented 7 years ago

measurementID is

An auto-generated unique identifier for each entry of data compiled as a MD5 hash from the following columns 'specimenID', 'measurementValue_original', 'scientificName_original', 'measurementType_original', 'basisOfRecord', as well as the fields from class 'specimen' and 'location'. "

Only 'measurementValue_original', 'scientificName_original', 'measurementType_original', 'basisOfRecord' are mandatory fields, the other ones are not. Is this a problem? (I guess that specimenID is now called occurenceID)

fdschneider commented 7 years ago

Yes, this needs some more precise definition in the glossary.

No problem for creating the hash. Empty fields are also adding information to the string that is hashed (i.e. the information that the field was empty).

In my understanding, we need to revise or drop the label for mandatory fields. We were talking about two types of 'mandatory':

  1. mandatory for being a standardised trait dataset valid for BExIS upload: This should just mean that at least the three core entries are present as standardised entries ('measurementValue', 'scientificName', 'measurementType') and a measurementID has been generated. (basisOfRecord will be redefined, see #14).
  2. mandatory for using the R script: this would include the user provided data ('measurementValue_original', 'scientificName_original', 'measurementType_original', measurementUnit_original). The output of this includes the columns mandatory for 1.

I don't think specimenID/occurenceID is mandatory in any of those senses.

aostrow commented 7 years ago

I have the impression we compile an hash-value just to have it. Or do we need it somewhere? What about all the other _ID columns (beside occurrenceID)? ID is just useful for (larger) machine-readability. Doe we really need it? I think we can also live with (unique) characters.

fdschneider commented 7 years ago

You are right, such a globally valid measurementID is not needed within BExIS or within a single dataset. I suggested it, since a globally unique identifier is necessary for a decentralised merging of multiple datasets. But this is for later (and can be done on the data user side at any time). For now, having the data provider in mind, it might be least demanding and distracting if we just go for an optional user-defined measurementID which is consistent within the dataset and may refer to further information, such as measurement accuracy.

fdschneider commented 7 years ago

Following Andreas' comment, I'll drop this for now. No need for a global identifier. The user can provide dataset-specific IDs to link multivariate measurements.