ODM2 / YODA-File

The YAML Observation Data Archive & exchange (YODA) File Format
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

All Sampling Features metadata on single line in YODA file #48

Open aufdenkampe opened 8 years ago

aufdenkampe commented 8 years ago

Given that we designed ODM2 Sites and Specimens to be subclasses of the Sampling Feature class with 1:1 relationships in all relational database implementations (see ODM2SamplingFeatures diagram, it is straightforward and preferable for the YODA file to have all Site and Specimen metadata appended to their respective Sampling Feature record in a single line.

In other words, rather than have the YODA file look like this:

SamplingFeatures:
  - &SamplingFeatureID0001 {SamplingFeatureUUID:  "572DE772-2AF1-4C44-8C6A-074E82273C10", SamplingFeatureTypeCV:  "Site", SamplingFeatureCode:  "LR_TWDEF_C", SamplingFeatureName:  "Climate Station at TW Daniels Experimental Forest", SamplingFeatureDescription:  "This is a continuous atmospheric monitoring site that is part of the Gradients Along Mountain to Urban Transitions (GAMUT) monitoring network.", SamplingFeatureGeotypeCV:  "Point", FeatureGeometry:  NULL, Elevation_m:  "2629.2", ElevationDatumCV:  "NGVD29"}
Sites:
  - {SamplingFeatureObj:  *SamplingFeatureID0001, SiteTypeCV:  "Atmosphere", Latitude:  41.864805, Longitude:  -111.507494, SpatialReferenceObj:  *SRSID0001}

The YODA file might look like this:

SamplingFeatures:
 - &SamplingFeatureID0001 {SamplingFeatureUUID:  "572DE772-2AF1-4C44-8C6A-074E82273C10", SamplingFeatureTypeCV:  "Site", SamplingFeatureCode:  "LR_TWDEF_C", SamplingFeatureName:  "Climate Station at TW Daniels Experimental Forest", SamplingFeatureDescription:  "This is a continuous atmospheric monitoring site that is part of the Gradients Along Mountain to Urban Transitions (GAMUT) monitoring network.", SamplingFeatureGeotypeCV:  "Point", FeatureGeometry:  NULL, Elevation_m:  "2629.2", ElevationDatumCV:  "NGVD29", SiteTypeCV:  "Atmosphere", Latitude:  41.864805, Longitude:  -111.507494, SpatialReferenceObj:  *SRSID0001}

The code that parses the YODA file would need to recognize SamplingFeatureTypeCV: "Site" in order to expect and properly parse the 4 additional attributes that are required for ODM2. This should be relatively simple.

The benefits of doing this are:

  1. The Excel macros that generate the YODA files from excel_templates become simpler, because they would more closely mimic the Sampling Features tab in the templates.
  2. The YODA file header becomes much shorter, especially for Specimen Time Series and Specimens templates, and therefore more human-readable.
aufdenkampe commented 8 years ago

If we implement issue #47, then we should also append all the "Related Features" metadata to the end of the YODA file line for each Sampling Feature. This would require a slightly increased level of logic in the JSON Schema validation and in the parsing code because, there would be nothing equivalent to the SamplingFeatureTypeCV: tag and also there could/should be future YODA files that need many-to-many cardinality between Related Features and therefore would require a separate block to represent. However, we could potentially take care of those expectations with the Profile: "TimeSeries" tag at the top of each YODA file.

horsburgh commented 8 years ago

@aufdenkampe - I must not have hit the button when I typed this in last time. I think we need to discuss these changes with programmers. I don't think it is preferable to put this information on one line. We've already developed code for the API around this information being in separate sections. Those sections define the objects. The ease of parsing the objects from the YODA file depends on the separate sections being present in the YODA files. They are also separate objects in the API, and, being separate in the YODA file, they parse easily into those objects without a bunch of conditional programming to figure out what is in that line. Plus - I'm not sure it is more human-readable to overload multiple types of information into really long lines rather than in shorter, easily identifiable sections. We shouldn't do this just to shorten the header.

emiliom commented 8 years ago

I wouldn't normally comment on a YODA discussion, but @horsburgh's comments caught my eye (after having quickly scanned Anthony's comments from today). I'm not going to weigh in on the merits of merging SamplingFeatures and Sites in a single line in the YODA file. What I do want to say is that the odm2api maps an excessively literal (if I may say so) representation of tables into objects. This makes it more painful to use (verbose, plodding) than some of us would prefer; eg, see this issue started by Dave V 3 months ago, and recently voted up by me. I'm not saying odm2api should be or can be refactored in the near term, but it's a broader discussion that those of us working on it will need to have sooner rather than later.

But this probably just reinforces Jeff's message that the decisions being discussed here should be taken only after broader discussions.

aufdenkampe commented 8 years ago

I must admit that I am not very familiar with the ODM2 API, but based on comments by @emiliom, I would have to agree with him strongly. The intent of many of our choices when developing ODM2 was so that it could be easily translated into a class-subclass based object-oriented programming framework. Sites and Specimens should not be separate objects in the code from their parent Sampling Feature object. The 1:1 relationships clearly indicate that these are subclasses of the Sampling Features class. Same goes for Results and the tables related to all the various ResultTypes. Anywhere that there is a 1:1 relationship between two tables in the ODM2 relational database implementation should be translated into a class-subclass approach in Python. That will massively simplify the information model, coding and YODA file structure. @horsburgh, remember that we decided that the heavy lifting for parsing YODA files into ODM2 should be done by YODA tools and the ODM2 API rather than on the built-in macros of our excel templates.