Open gsrohde opened 9 years ago
date can be null. Many of these are fertilization rates:
SELECT mgmttype, count(*) as n
FROM managements
WHERE date IS NULL group by mgmttype order by n desc;
and others are rates of irrigation etc. Where the date(s) of fertilization is not known and may have occurred repeatedly over many years. So we can allow NULL dates for managements.
For the duplicates, it is only meaningful to expect that (date, mgmttype) will be unique for the subset of records associated with a single treatment_id.
Even if you group by not only date and mgmttype but level and units as well, there are plenty of duplicates even among managements associated with a common treatment. Try running
select distinct count(*), array_agg(m.id) treatment_id, date, mgmttype, level, units from managements_treatments mt join managements m on m.id = mt.management_id group by treatment_id, date, mgmttype, level, units having count(*) > 1;
At least most of the groups of duplicates are only of size 2. Decide if you want to worry about this.
Eliminate NULL date values--Won't do (see below)NULL date values
yields 200 rows!
Non-uniqueness of (date, mgmttype)
yields 678 groups, some involving as many as two dozen or more rows! This will only increase after NULLs are eliminated. Is this really a viable key, or do we need to include another column?
To see the full rows of all of the duplicates, try running
Adding constraints
The code is
In addition to constraints for implementing the key, we should ensure
mgmttype
is whitespace-normalized.