USGCRP / gcis-conventions

Repository for the collection, management, and versioning of the GCIS data management conventions.
https://usgcrp.github.io/gcis-conventions/
1 stars 0 forks source link

Webpage Convention #18

Closed lomky closed 6 years ago

lomky commented 6 years ago

A ticket to discuss the conventions surrounding Webpage.

Current Webpage Conventions.

Webpage Fields:

   Column    |            Type             | Modifiers | Storage  | Stats target |                 Description
-------------+-----------------------------+-----------+----------+--------------+----------------------------------------------
 identifier  | character varying           | not null  | extended |              | A globally identifier (UUID)
 url         | character varying           | not null  | extended |              | The URL.
 title       | character varying           |           | extended |              | The title of the webpage.
 access_date | timestamp without time zone |           | plain    |              | The date on which this webpage was accessed.
Indexes:
    "webpage_pkey" PRIMARY KEY, btree (identifier)
    "webpage_url_key" UNIQUE CONSTRAINT, btree (url)
Check constraints:
    "ck_webpage_identifier" CHECK (identifier::text ~ similar_escape('[a-z0-9_-]+'::text, NULL::text))
Triggers:
    audit_trigger_row AFTER INSERT OR DELETE OR UPDATE ON webpage FOR EACH ROW EXECUTE PROCEDURE audit.if_modified_func('true')
    audit_trigger_stm AFTER TRUNCATE ON webpage FOR EACH STATEMENT EXECUTE PROCEDURE audit.if_modified_func('true')
    delpub BEFORE DELETE ON webpage FOR EACH ROW EXECUTE PROCEDURE delete_publication()
    updatepub BEFORE UPDATE ON webpage FOR EACH ROW WHEN (new.identifier::text <> old.identifier::text) EXECUTE PROCEDURE update_publication()

Provenance Connections:

citedBy and cites

Relationships:

contributors
files
gcmd_keywords
regions
lomky commented 6 years ago

Field breakdown

identifier - fine as UUID
url - the full url, not including any parameters. Parameters should be collected on either the activity or references as appropriate.
title - whatever the Reference names the URL, otherwise as stated on the webpage itself, correct as of the creation of the object.
access_date - the access date for this webpage. Should not be used, as we have unique URL requirement. A access date should go on either the Reference object (e.g. if this is a referenced publication) or an Activity object (e.g. if this cited on a Figure)

lomky commented 6 years ago

Webpages are valid to be citedBy. They do not use cites, as they are not USGCRP products.
Contributors: webpages often have a host, and may have authors, as appropriate.
Files - not used on Webpages.
gcmd_keywords & regions not yet implemented.

lomky commented 6 years ago

Nice improvements:

It would be nice to be able to mark a website as known to be dead or defunct.

Edge Case Conventions:

rasherman commented 6 years ago

This all looks good to me. Can be accepted as is, though it would be good add something from @amruelama about making sure we don't cite a dataset landing page as a webpage and how to make that distinction.

amruelama commented 6 years ago

Here are few examples of webpages that look like datasets (data in this case):

  1. https://data.globalchange.gov/webpage/8649f127-816d-48fe-8dc7-13d03e766ffa
  2. https://data.globalchange.gov/webpage/3872f1da-ea8b-43dd-94cc-f1ad6231ba43
  3. https://data.globalchange.gov/webpage/85cddc44-daab-4784-bcd5-718df0180b1f
  4. https://data.globalchange.gov/webpage/5626d40a-adb7-457f-8db8-09fa002ad080

Will list out the distinctions based on different examples soon.

lomky commented 6 years ago

I put the content into the document. Leaving this open until we have the language on webpage vs dataset.

lomky commented 6 years ago

@amruelama any progress on webpage vs dataset distinctions?

amruelama commented 6 years ago

This will require a manual QA to determine if a dataset is categorized as a webpage in GCIS. We have done this process previously in issue #329. Basically, we need to use its reference as a source and determine if they have mentioned the use of the 'dataset' to derive a table of a figure. The easiest way to filter this is searching the keyword 'data' or 'dataset' in its URL. Also, a thorough QA is needed in order to complete this process. After the QA process, we should convert the webpage to dataset (#506) To avoid this misinterpretation in the future, this QA could be done before the release when syncing the reference as type 'web page' and while adding new child publications as type 'webpage' through script.