USGCRP / gcis-provenance-evaluator

Evaluating the GCIS project's Provenance
Other
1 stars 0 forks source link

Relative Score Affect of each Expected Component #6

Open lomky opened 5 years ago

lomky commented 5 years ago

This ticket is to talk through the individual cases to look for any exceptions to the general rule for how a missing component should affect the parent object's score.

By default, a 'required' should be ranked by the scoring script as a 0 if this doesn't exist.
By default, a missing 'optional' one won't affect the parent rank. Do we give a bump to an object that has an optional component, above & beyond the score of that component (as in, bonus for including any add'tl prov, even if it's a low scoring one?)


Note: Some things are common among all publication types, and we can likely come to a general conclusion for them first and check each for individual exceptions. Namely:



Pass through:

lomky commented 5 years ago

Contributor Note:

For every publication type, we should determine what are the required, allowed, and disallowed role types and determine the scoring for each.

USGCRP/gcis-conventions#31

All Roles:

 author                   | 18426
 point_of_contact         |  1938
 editor                   |   876
 contributing_author      |   422
 publisher                |   390
 contributor              |   373
 lead_author              |   349
 funding_agency           |   347
 distributor              |   244
 host                     |   136
 data_archive             |   107
 convening_lead_author    |    99
 scientist                |    84
 advisor                  |    75
 coordinating_lead_author |    64
 data_producer            |    57
 contributing_agency      |    56
 coordinator              |    33
 primary_author           |    25
 analyst                  |    14
 graphic_artist           |    13
 lead_agency              |    11
 executive_editor         |    11
 principal_author         |     5
 engineer                 |     2
 manager                  |     1

Roles that held by Orgs without a person:

 role_type_identifier  | num
-----------------------+-----
 author                | 419
 publisher             | 390
 funding_agency        | 347
 distributor           | 244
 host                  | 134
 data_archive          | 107
 contributor           |  91
 contributing_agency   |  56
 data_producer         |  50
 editor                |  14
 lead_agency           |  11
 convening_lead_author |  11
 point_of_contact      |   4
 graphic_artist        |   4
 analyst               |   4
 engineer              |   2
 contributing_author   |   2
 coordinator           |   2
 lead_author           |   1
 manager               |   1
 scientist             |   1
lomky commented 5 years ago

By default, should a object that has optional components get a 'bonus' score on top of the averages it connects to?

lomky commented 5 years ago

For the contributor object itself, it only serves to point through to Person and/or Org, and combine those scores, but has no inherent score.

lomky commented 5 years ago

General Thoughts on Publication type:

lomky commented 5 years ago
rasherman commented 5 years ago
rasherman commented 5 years ago
lomky commented 5 years ago

Thank you all for moving forward! Could you explain the details on a few for me?

rasherman commented 5 years ago
  1. Dataset contributor: every dataset should have some person and/or organization that produced it, but I don't think we can restrict what type they would be. In some cases it would be a publisher, in some an author...
  2. Dataset keywords: we are not assigning keywords to datasets at this point, but they definitely COULD all have them (keywords were originally designed to be used to describe datasets). It should be a very low weight, but it's always possible in the future that we could have some ingest of keywords assigned to NASA or NOAA dataset catalogs and we should already have it baked in that doing something like that would improve a bunch of scores, because it is a positive change to our system.
  3. Figure references: We were waffling on this point. Probably every figure should have a reference, but I can definitely imagine cases where they wouldn't and it would be fine. For instance, a photograph of something might have an activity or some other sourcing of who took the photograph, but it might not have a citation to a previous publication.
  4. Finding figures: at this point no findings have ever had figures, but isn't it possible that one could? What if in the next report we assign the figure explaining likelihoods to findings, or something else like that?
lomky commented 5 years ago

Thanks! I agree wholeheartedly with 1, 2, and 3. For Finding figures, I can see that as possible, but I don't think I'd worry about future proofing Findings in that way. The same could be said for Finding tables or Finding datasets. So if the situation ever came up, I'd want to update our rating then. Thoughts?

R-Aniekwu commented 5 years ago

Hey Kat. I would share your sentiment if a "Figure" was a required component of a "Finding". Since it is optional, and we agree that there is a possibility that a finding could have a figure component, then does it really matter if we include it in the component score now, as opposed to later? I do not think it does.

On Thu, Feb 7, 2019 at 10:58 AM Kat Tipton notifications@github.com wrote:

Thanks! I agree wholeheartedly with 1, 2, and 3. For Finding figures, I can see that as possible, but I don't think I'd worry about future proofing Findings in that way. The same could be said for Finding tables or Finding datasets. So if the situation ever came up, I'd want to update our rating then. Thoughts?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/USGCRP/gcis-provenance-evaluator/issues/6#issuecomment-461484230, or mute the thread https://github.com/notifications/unsubscribe-auth/AfWrF-IlVYPq-uZQRIvbMmDVREP0Wk2zks5vLE00gaJpZM4aaK6a .

-- Reuben T. Aniekwu Research Coordinator | Contractor

U.S. Global Change Research Program 1800 G St. NW, Suite 9100 Washington, D.C. 20006

lomky commented 5 years ago