PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

treatments: Should this have a key? #212

Open gsrohde opened 9 years ago

gsrohde commented 9 years ago

The constraints spreadsheet offers (citation_id, site_id, name), but currently neither citation_id nor site_id are columns in this table. The GIST doesn't have a uniqueness constraint on this table (other than the usual primary key constraint on id).

gsrohde commented 9 years ago

Thinking about this again, the bulk upload code assumes that for any citation, the value of treatments.name will be unique across all treatments associated with the citation via the citations_treatments table. To see cases where this does not hold, run

select count(*), name AS "treatment name", format('%s %s', c.author, c.year) AS citation from treatments t join citations_treatments ct on ct.treatment_id = t.id join citations c on c.id = ct.citation_id group by c.author, c.year, name having count(*) > 1 order by author, year;

Often, however, like-named treatments are distinguished by definition, as the following query shows:

select count(*), name AS "treatment name", array_agg(definition) AS "definitions", format('%s %s', c.author, c.year) AS citation from treatments t join citations_treatments ct on ct.treatment_id = t.id join citations c on c.id = ct.citation_id group by c.author, c.year, name having count(*) > 1 order by author, year;

We could perhaps relax the constraint to "the value of the pair (treatments.name, treatments.definition) will be unique across all treatments associated with the citation via the citations_treatments table". This reduces the number of duplicates from 103 to 47 groups.

sites is somewhat more remotely related to treatments, and it is questionable what sort of constraint involving related sites would be applicable to the treatments table.

dlebauer commented 8 years ago

is this a duplicate of #249? See that issue - we can query the site_id, citatoin_id, and treatment_id from the traits or yields table.

But perhaps the constraint should just be that each treatment name be unique within the associated sites it is associated with in either the traits or yields tables.