Relative Score Affect of each Expected Component

lomky commented 5 years ago

This ticket is to talk through the individual cases to look for any exceptions to the general rule for how a missing component should affect the parent object's score.

By default, a 'required' should be ranked by the scoring script as a 0 if this doesn't exist.
By default, a missing 'optional' one won't affect the parent rank. Do we give a bump to an object that has an optional component, above & beyond the score of that component (as in, bonus for including any add'tl prov, even if it's a low scoring one?)

Note: Some things are common among all publication types, and we can likely come to a general conclusion for them first and check each for individual exceptions. Namely:

publication
- optional
- activity (rare on most)
- contributor
- GCMD keywords
- Regions

article:
- optional:
- activity (rare)
- contributor
- required:
- journal
book:
- optional:
- activity (rare)
- contributor
chapter:
- required:
- file
- optional:
- figure
- finding
- table
- activity (rare)
- contributor
dataset:
- optional:
- activity (rare)
- contributor
figure:
- optional:
- activity (common)
- contributors
- chapter
- required:
- report
- file
- image
- Contributor: Point of Contact
finding:
- optional:
- activity (common)
- contributor
- chapter
- required
- report
image:
- optional:
- activity (common)
- contributor
- required
- figure
- file
journal:
- optional:
- activity (rare)
- contributor
- required
- Contributor: Publisher
report:
- required:
- file
- optional:
- figure
- table
- finding
- chapter
- activity (rare)
- contributor
scenario:
- optional:
- activity (rare)
- contributor
- file
table:
- optional:
- activity (common)
- contributor
- chapter
- required:
- report
- array
Array
- optional:
- activity (common)
webpage:
- optional:
- activity (rare)
- contributor Host
- contributor

Pass through:

contributor:*
- optional:
- person
- organization
reference:
- required:
- (citing) publication(s)
- (child) publication

lomky commented 5 years ago

Contributor Note:

For every publication type, we should determine what are the required, allowed, and disallowed role types and determine the scoring for each.

USGCRP/gcis-conventions#31

All Roles:

 author                   | 18426
 point_of_contact         |  1938
 editor                   |   876
 contributing_author      |   422
 publisher                |   390
 contributor              |   373
 lead_author              |   349
 funding_agency           |   347
 distributor              |   244
 host                     |   136
 data_archive             |   107
 convening_lead_author    |    99
 scientist                |    84
 advisor                  |    75
 coordinating_lead_author |    64
 data_producer            |    57
 contributing_agency      |    56
 coordinator              |    33
 primary_author           |    25
 analyst                  |    14
 graphic_artist           |    13
 lead_agency              |    11
 executive_editor         |    11
 principal_author         |     5
 engineer                 |     2
 manager                  |     1

Roles that held by Orgs without a person:

 role_type_identifier  | num
-----------------------+-----
 author                | 419
 publisher             | 390
 funding_agency        | 347
 distributor           | 244
 host                  | 134
 data_archive          | 107
 contributor           |  91
 contributing_agency   |  56
 data_producer         |  50
 editor                |  14
 lead_agency           |  11
 convening_lead_author |  11
 point_of_contact      |   4
 graphic_artist        |   4
 analyst               |   4
 engineer              |   2
 contributing_author   |   2
 coordinator           |   2
 lead_author           |   1
 manager               |   1
 scientist             |   1

lomky commented 5 years ago

By default, should a object that has optional components get a 'bonus' score on top of the averages it connects to?

we think this might be a nice thing to add as a flag to run.
- flag to say 'increase the score of the parent object if it has optional component x'

lomky commented 5 years ago

For the contributor object itself, it only serves to point through to Person and/or Org, and combine those scores, but has no inherent score.

lomky commented 5 years ago

General Thoughts on Publication type:

publication
- optional
  - activity (rare on most)
    - no affect on parent score
  - regions
    - no affect on parent score
  - contributor
    - cannot say at a general publication level
- required
  - GCMD keywords
    - affects the parent.
    - weighted down to be less impactful than, say, a figure missing an image

lomky commented 5 years ago

article:
- required:
  - journal
  - 0, high weight
  - contributor: author
  - 0, high weight
- optional:
  - contributor: point of contact
  - no effect, not all have it
  - activity (rare)
  - no effect
- book:
  - require:
  - contributor: Publisher
    - 0, high weight
  - optional:
  - contributor: Author
    - no affect on parent score
  - contributor: Editor
    - no affect on parent score
  - activity (rare)
    - no affect on parent score

rasherman commented 5 years ago

chapter:
- require:
- files
- report (might have to be linked for score the opposite direction)
- references
- keywords
- optional:
- findings
- figures
- tables
- contributors
- activity
- regions
dataset
- required:
- contributor
- keywords
- optional:
- lexicon
- regions
- activity
figure
- required:
- image
- file
- contributor (Point of Contact)
- Report (or indicator)
- keywords
- optional:
- references
- region
- activity
- chapter

rasherman commented 5 years ago

finding
- required:
- report
- references
- keywords
- optional:
- figure
- chapter
- contributor
- region
- activity
image
- required:
- figure (?)
- file
- activity
- optional:
- contributor
- keywords
- regions
journal
- required:
- contributor (Publisher)
- optional:
- keywords
- regions
- contributor (other than Publisher)

lomky commented 5 years ago

Thank you all for moving forward! Could you explain the details on a few for me?

dataset
- required:
- contributor
  - which contributor type?
- keywords
  - I don't understand this one. Are we giving datasets keywords? Should all datasets always have applicable gcmd keywords?
- figure
- optional:
  - references
  - are these really optional? Shouldn't every figure, in the best case, have its references?
- finding
- optional:
  - figure
  - I don't understand why a finding could have a figure?

rasherman commented 5 years ago

Dataset contributor: every dataset should have some person and/or organization that produced it, but I don't think we can restrict what type they would be. In some cases it would be a publisher, in some an author...
Dataset keywords: we are not assigning keywords to datasets at this point, but they definitely COULD all have them (keywords were originally designed to be used to describe datasets). It should be a very low weight, but it's always possible in the future that we could have some ingest of keywords assigned to NASA or NOAA dataset catalogs and we should already have it baked in that doing something like that would improve a bunch of scores, because it is a positive change to our system.
Figure references: We were waffling on this point. Probably every figure should have a reference, but I can definitely imagine cases where they wouldn't and it would be fine. For instance, a photograph of something might have an activity or some other sourcing of who took the photograph, but it might not have a citation to a previous publication.
Finding figures: at this point no findings have ever had figures, but isn't it possible that one could? What if in the next report we assign the figure explaining likelihoods to findings, or something else like that?

lomky commented 5 years ago

Thanks! I agree wholeheartedly with 1, 2, and 3. For Finding figures, I can see that as possible, but I don't think I'd worry about future proofing Findings in that way. The same could be said for Finding tables or Finding datasets. So if the situation ever came up, I'd want to update our rating then. Thoughts?

R-Aniekwu commented 5 years ago

Hey Kat. I would share your sentiment if a "Figure" was a required component of a "Finding". Since it is optional, and we agree that there is a possibility that a finding could have a figure component, then does it really matter if we include it in the component score now, as opposed to later? I do not think it does.

On Thu, Feb 7, 2019 at 10:58 AM Kat Tipton notifications@github.com wrote:

Thanks! I agree wholeheartedly with 1, 2, and 3. For Finding figures, I can see that as possible, but I don't think I'd worry about future proofing Findings in that way. The same could be said for Finding tables or Finding datasets. So if the situation ever came up, I'd want to update our rating then. Thoughts?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/USGCRP/gcis-provenance-evaluator/issues/6#issuecomment-461484230, or mute the thread https://github.com/notifications/unsubscribe-auth/AfWrF-IlVYPq-uZQRIvbMmDVREP0Wk2zks5vLE00gaJpZM4aaK6a .

-- Reuben T. Aniekwu Research Coordinator | Contractor

U.S. Global Change Research Program 1800 G St. NW, Suite 9100 Washington, D.C. 20006

lomky commented 5 years ago

Reports
- Required
  - File
  - Contributor (Publisher or Distributor)
  - Contributor (Author or Editor)
- Optional
  - gcmd_keywords
  - figure
  - table
  - finding
  - chapter
  - activity
  - contributors
  - regions
  - references
Scenario
- Required
  - Contributor
- Optional
  - file
Table
- Required
  - array
  - report
  - keywords
- Optional
  - Contributor
  - chapter
  - references
  - regions
  - activity
Array
- required
  - activity
Webpage
- Required
  - contributor Host
- Optional
  - Contributor
  - activity

USGCRP / gcis-provenance-evaluator

Relative Score Affect of each Expected Component #6