Open gsrohde opened 9 years ago
Thinking about this again, the bulk upload code assumes that for any citation, the value of treatments.name
will be unique across all treatments associated with the citation via the citations_treatments
table. To see cases where this does not hold, run
select count(*), name AS "treatment name", format('%s %s', c.author, c.year) AS citation from treatments t join citations_treatments ct on ct.treatment_id = t.id join citations c on c.id = ct.citation_id group by c.author, c.year, name having count(*) > 1 order by author, year;
Often, however, like-named treatments are distinguished by definition, as the following query shows:
select count(*), name AS "treatment name", array_agg(definition) AS "definitions", format('%s %s', c.author, c.year) AS citation from treatments t join citations_treatments ct on ct.treatment_id = t.id join citations c on c.id = ct.citation_id group by c.author, c.year, name having count(*) > 1 order by author, year;
We could perhaps relax the constraint to "the value of the pair (treatments.name, treatments.definition)
will be unique across all treatments associated with the citation via the citations_treatments
table". This reduces the number of duplicates from 103 to 47 groups.
sites
is somewhat more remotely related to treatments
, and it is questionable what sort of constraint involving related sites would be applicable to the treatments
table.
is this a duplicate of #249? See that issue - we can query the site_id, citatoin_id, and treatment_id from the traits or yields table.
But perhaps the constraint should just be that each treatment name be unique within the associated sites it is associated with in either the traits or yields tables.
The constraints spreadsheet offers
(citation_id, site_id, name)
, but currently neithercitation_id
norsite_id
are columns in this table. The GIST doesn't have a uniqueness constraint on this table (other than the usual primary key constraint onid
).