PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

citations value constraints: needed cleanup and decisions #230

Open gsrohde opened 9 years ago

gsrohde commented 9 years ago

author, title

If so:

If so:

If so:

If so:

These will be part of a candidate key, but cleanup is required before a uniqueness constraint can be applied, so we only apply the value constraints now.

year

With two exceptions, the earliest year is 1951. The exception with year 0 looks like a test for. The exception with year 201 looks like a typo.

vol

SELECT * FROM citations WHERE NOT vol > 0;

shows the offending rows. I assume the row with "test" in it can be removed.

The other row is a book (whose title is in the journal column), so perhaps 0 here means vol is inapplicable. Note that in this case the citation is the whole book. Other cases where NULL is inapplicable are when the book is conference proceedings and the citation is thus to a small portion of the book. (In this case pg is usually non-empty.)

Options

  1. Easy option: just allow 0; use CHECK (vol >= 0) instead. Don't worry that no semantic distinction is being made between 0 and NULL (in fact we could go futher and replace all NULLs with 0 and add a NOT NULL constraint). Don't worry that some journal someday starts using 0 as a bona fide volume number.
  2. Easy option 2: Keep the constraint as is, continue to allow NULL, and change the 0 to NULL.
  3. Harder option: Allow both 0 and NULL, but make some semantic distinction in their uses.

    pg

Violators have been fixed.

We will have to fix the Rails interface to change hyphen to n-dash when entries are added or updated.

url, pdf

These are now required to either be bona fide URLs with an http or https schema, be the empty string, or be some parenthesized non-empty string. The later accommodates entries such as "(paper copy available in blue folder)" and "(email:dwng@illinois.edu)".

doi

Violators have been fixed. The Rails interface should validate and/or automatically fix invalid entries.

dlebauer commented 8 years ago

@gsrohde is this ready to be implemented?

gsrohde commented 8 years ago

@dlebauer

All constraints have already been added except for minimum year.

select * from citations where year < 1900;

returns

     id     | author | year |                                         title                                         |                  
       journal                         | vol | pg | url | pdf |         created_at         |         updated_at         | doi |  user_i
d   
------------+--------+------+---------------------------------------------------------------------------------------+------------------
---------------------------------------+-----+----+-----+-----+----------------------------+----------------------------+-----+--------
----
 1000000001 |        | 1800 | FACE EXPERIMENTS                                                                      |                  
                                       |     |    |     |     | 2014-09-25 00:24:42.949729 | 2014-09-25 00:24:42.949729 |     | 1000000
003
 1000000002 |        | 1800 | MACROSYSTEMS SITES                                                                    |                  
                                       |     |    |     |     | 2015-01-07 14:41:39.61514  | 2015-01-07 14:41:39.61514  |     | 1000000
005
        619 | Wang   |  201 | Predicted yields of short-rotation hybrid poplar (Populus spp.) for the contiguous US | Ecological Applic
ations (accepted with minor revisions) |     |    |     |     | 2012-09-20 00:22:20        | 2012-09-20 00:22:20        |     |        
  2
        780 | Osnas  |    0 | test                                                                                  | test             
                                       |   0 |    |     |     | 2015-01-06 21:03:58.869857 | 2015-03-12 15:20:39.276193 |     |        
162
(4 rows)

So two questions: Should some of these be amended or removed and should we have a minimum year constraint (and if so, what?—that's three questions).

dlebauer commented 8 years ago

I cleaned up Wang and Osnas. @mdietze can fix the other two at BU if necessary.

1440 would be a sensible minimum value for publication year because https://en.m.wikipedia.org/wiki/Printing_press