Primary Keys in Sites and Specimens

horsburgh commented 10 years ago

It has been proposed to drop SiteID and SpecimenID from the Sites and Specimens entities and instead have SamplingFeatureID serve as the primary key for those entities - and at the same time a foreign key to the SamplingFeatures entity.

I really don’t have any objections to this. I have read up a little bit and it seems like an implementation issue. I will say that it complicated my script to create my ODM2 database from my Little Bear River ODM 1.1 database because there was no such thing as a SamplingFeatureID in ODM 1.1.1 and I ended up making the SamplingFeatureIDs the same as my SiteIDs anyway.

Those relationships are “identifying relationships”, which means that the existence of a row in the child table (Specimens or Sites) depends on a row in a parent table (SamplingFeatures). From what I have read, it’s pretty common for people to create a primary key in the child table that does not include the foreign key to the parent table. But, some believe that the “right” way to formally capture this is to have the foreign key from the parent table be part of the child’s primary key (in our case for these tables it would simply be the child’s primary key). Most are saying that this is the way super type/subtype relationships "should" be modeled.

The logical relationship is that the child cannot exist without the parent. And, there should probably be a check constraint eventually that makes sure a given SamplingFeatureID ends up in ONLY ONE child table (it is either a Site OR a Specimen, it can’t be both).

I suggest we accept Bruce’s suggestion and git rid of SiteID and SpecimenID.

As a related note: there are other places where this may affect the schema (e.g., Actions). We can only do what is suggested above when there is a 1:1 relationship between the parent entity and the child entity.

aufdenkampe commented 10 years ago

Jeff, Thanks for digging into this and exploring both sides of the issue. Bruce is still digging up the white paper he mentioned to me (I cc'ed him on this thread). Regardless, it seems that adopting Bruce's suggestion to drop SiteID and SpecimenID has two benefits:

Clear indication that Sites and Specimens are sub-classes of the SamplingFeatures base class.
Built-in constraint that the child (Site or Specimen record) can not exist without the parent (SamplingFeature record).

By adopting this convention, it would indeed have implications to some of the more specific ActionTypes that we will develop with additional fields for the Equipment, Sensor and Sample extensions (i.e. Equipment.CalibrationActions, Equipment.MaintenanceActions, Sensors.DeploymentActions, Samples.AnalysisBatchActions). However, these schemas are not yet close to being finalized.

We also had 1-to-1(or0) relationships in the Results schema, and might again depending on how we implement other data types.

On Fri, Feb 7, 2014 at 3:50 PM, Jeff Horsburgh notifications@github.comwrote:

It has been proposed to drop SiteID and SpecimenID from the Sites and Specimens entities and instead have SamplingFeatureID serve as the primary key for those entities - and at the same time a foreign key to the SamplingFeatures entity.

I really don't have any objections to this. I have read up a little bit and it seems like an implementation issue. I will say that it complicated my script to create my ODM2 database from my Little Bear River ODM 1.1 database because there was no such thing as a SamplingFeatureID in ODM 1.1.1 and I ended up making the SamplingFeatureIDs the same as my SiteIDs anyway.

Those relationships are "identifying relationships", which means that the existence of a row in the child table (Specimens or Sites) depends on a row in a parent table (SamplingFeatures). From what I have read, it's pretty common for people to create a primary key in the child table that does not include the foreign key to the parent table. But, some believe that the "right" way to formally capture this is to have the foreign key from the parent table be part of the child's primary key (in our case for these tables it would simply be the child's primary key). Most are saying that this is the way super type/subtype relationships "should" be modeled.

The logical relationship is that the child cannot exist without the parent. And, there should probably be a check constraint eventually that makes sure a given SamplingFeatureID ends up in ONLY ONE child table (it is either a Site OR a Specimen, it can't be both).

I suggest we accept Bruce's suggestion and git rid of SiteID and SpecimenID.

As a related note: there are other places where this may affect the schema (e.g., Actions). We can only do what is suggested above when there is a 1:1 relationship between the parent entity and the child entity.

Reply to this email directly or view it on GitHubhttps://github.com/UCHIC/ODM2/issues/11 .

Anthony K. Aufdenkampe, Ph.D. Associate Research Scientist - Isotope & Organic Geochemistry Stroud Water Research Center 970 Spencer Road, Avondale, PA 19311 Tel. 610-268-2153 ext. 263; Fax 610-268-0490 Mobile 484-748-0252 http://www.stroudcenter.org/about/aufdenkampe.shtm

emiliom commented 10 years ago

Taken care of in my commit on the SamplingFeatures_em feature branch. Not closing this issue until Jeff (et al?) has reviewed other SamplingFeatures changes in my commit, and merged into master.

horsburgh commented 10 years ago

I have now merged these changes into the master branch. When you are satisfied that the changes have been included correctly, please delete your original feature branch and close this issue. Note that I am opening a separate issue regarding the addition of "SamplingFeatureName" in the "SamplingFeatures" entity.

aufdenkampe commented 10 years ago

These changes have been merged into the Master correctly, as far as I can tell.

ODM2 / ODM2

Primary Keys in Sites and Specimens #11