con / tributors

Pay tribute to your contributors! A tool to automatically update contributor files.
https://con.github.io/tributors/
Apache License 2.0
13 stars 6 forks source link

zenodo: ability to add grant information based on information from orcid #50

Open yarikoptic opened 4 years ago

yarikoptic commented 4 years ago

zenodo allows to annotate for funding grants which supported the project.

So for datalad we have

  "grants": [
    {"id": "10.13039/100000001::1429999"}
  ],

where IIRC 10.13039/100000001 corresponds to NSF and 1429999 is the grant number.

that funding agency thing is a proper DOI: ```shell $> curl -i --head https://doi.org/10.13039/100000001 HTTP/2 302 date: Tue, 04 Aug 2020 19:56:24 GMT content-type: text/html;charset=utf-8 content-length: 209 set-cookie: __cfduid=d0fac7759c716bcc769a51e389d3b85dc1596570983; expires=Thu, 03-Sep-20 19:56:23 GMT; path=/; domain=.doi.org; HttpOnly; SameSite=Lax; Secure vary: Accept location: http://data.crossref.org/fundingdata/funder/10.13039/100000001 expires: Tue, 04 Aug 2020 20:27:10 GMT cf-cache-status: DYNAMIC cf-request-id: 045ca4f630000073d90d805200000001 expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct" strict-transport-security: max-age=31536000; includeSubDomains; preload server: cloudflare cf-ray: 5bdad769ecaa73d9-IAD ```

ORCID records also contain grant information, see e.g. my orcid record which would have

    "fundings": {
      "last-modified-date": {
        "value": 1579231189138
      },
      "group": [
        {
          "last-modified-date": {
            "value": 1437078901426
          },
          "external-ids": {
            "external-id": [
              {
                "external-id-type": "grant_number",
                "external-id-value": "1429999",
                "external-id-normalized": null,
                "external-id-normalized-error": null,
                "external-id-url": {
                  "value": "http://www.nsf.gov/awardsearch/showAward?AWD_ID=1429999&HistoricalAwards=false"
                },
                "external-id-relationship": "self"
              }
            ]
          },

and some more...

IIRC It is PITA though to figure out what is the DOI for the funder, make sure that id is correct (since there is no validator for zenodo, but there is a sandbox where record could be uploaded I was told) etc, and so that zenodo doesn't just blow up later on.

If in --interactive mode tributors (when adding a new contributor or just triggered explicitly to seek for grants information) could ask to have some grants to be added (as found for a specific ORCID record or all ORCID records, and thus list of people for a given grant which have it?) or ignored (now and forever since not pertinent to the project) -- that would be a great help and funders would definitely appreciate it ;) !

vsoch commented 4 years ago

Hmm. I'm thinking that if you look at most of these projects, you don't see grant information there, and it would be more of a PITA to have to skip each time than to look up. This isn't something that I think should be added to the user interface.

yarikoptic commented 4 years ago

you don't see because most of them don't know it is possible, and some others tried and failed.

"skip each time" - that is what I am talking here and also about skip person while (not) adding ORCID -- tributors should maintain some record on what records to skip (per person like orcid or per project like specific grant)

vsoch commented 4 years ago

We can't for orcid because the lookup is based on GitHub login, and the zenodo API / metadata has no understanding of that.

yarikoptic commented 4 years ago

"We can't" what exactly?

vsoch commented 4 years ago

In order to store some parameter like a skip preference in the tributors cache we would need to be able to look up by the GitHub login, which is the index for the tributors file. Since Zenodo has no understanding of a github login there is no way that we can do that.

yarikoptic commented 4 years ago

re skipping the grant - that one is project wide, does not need to be associated with any account. There could be grants-to-skip key in .tributors (probably worth restructuring to have dedicated "contributors" for contributors) or alike which would be a list of them.

What .tributors file is and its role to me is still not entirely clear (and could not even answer Michael when he asked besides "it is github cache") if it doesn't have/keep some association between different entities associated with the same person, e.g. github login (identifies on github), full name (used as an ID for zenodo records, and also for lookup in orcid as well as email could be used), orcid id (for orcid), etc as needed to identify a contributor across different locations.
So the question is how to establish such association I guess. Well -- from git/github we should be able to know names and emails (git, .mailmap has those!), names could be matched (again -- best guesses, if not obvious -- interactively among choices), etc. Then .tributors could be used to record any contributor related setting to use across queries etc.

vsoch commented 4 years ago

Name isn’t a good unique identifier, just having a difference in First Last or Last First or a middle name would throw it off. And look at mine - Vanessasaurus! It would never work. I think the cache is out of scope for the current PR, but I do agree we need some way to create association between the resources. I couldn’t come up with anything reliable, hence why I settled on the GitHub login and zenodo just doesn’t use it. The exception is for email, we can cross reference on that (and we do).

yarikoptic commented 4 years ago

for zenodo, AFAIK name is the identifier (within the project). Whenever there would be two "Vanessa Sochat" in a project/on a paper, I guess something indeed would need to be done so to not throw zenodo off its whims as well. if you could check if there is some allowed "id" like field to be added in zenodo, may be it could be used as an auxiliary (only for such cases) disambiguator. Otherwise IMHO name ("Last, First Names") is good enough for zenodo.

if instead of

    "glalteva": {
        "name": "glalteva",
        "blog": "https://github.com/glalteva"
    },

which is github centric, you get just a list

[ 
  { "name": "Alteva, Gergana",
    "github-id": "glalteva",
   },
]

then you could even have the same name in multiple records etc. Depending on what you sync from/to what you would establish a corresponding associative lookup (dict).

vsoch commented 4 years ago

That could work! So for a new PR I can refactor the tributors lookup time be a list, and any matching field is game for calling it a match. That also means however that it’s very likely to get doubles for people, for example for me there would be one entry for vsoch and Vanessasaurus and possibly another with a name and Zenodo, but you’d be unlikely to find that name in the wild. Do you think this redundancy is a better solution?