WCRP-CMIP / CMIP6_CVs

Controlled Vocabularies (CVs) for use in CMIP6
Creative Commons Attribution 4.0 International
158 stars 80 forks source link

institution_id should contain homepage #379

Closed esdoc-system-user closed 1 year ago

esdoc-system-user commented 7 years ago

Institutional homepage should be added to institution_id.

durack1 commented 7 years ago

Thanks @esdoc-system-user, what use case has prompted this suggestion? It would be great to know how this would be used, as currently the only package that we're actively considering is CMOR

durack1 commented 7 years ago

@esdoc-system-user there doesn't appear to be a valid use case here, so will close. Please reopen if required

durack1 commented 5 years ago

Following work by @glevava we'll update the institution_id template to include additional details following the example:

...
  {
    "code": "CSIRO-ARCCSS-BoM",
    "name": "",
    "postalAddress": "",
    "coordinates": "",
    "homepage": "",
    "consortia": [
      {
        "code": "CSIRO",
        "name": "Commonwealth Scientific and Industrial Research Organisation",
        "postalAddress": "Aspendale, Victoria 3195, Australia",
        "coordinates": "145.0435415,-38.0028413",
        "homepage": "https://www.csiro.au/"
      },
      {
        "code": "ARCCSS",
        "name": "Australian Research Council - Centre of Excellence for Climate System Science",
        "postalAddress": "UNSW Sydney NSW 2052, Australia",
        "coordinates": "151.2277251,-33.9166077",
        "homepage": "https://www.climatescience.org.au/tags/arccss"
      },
      {
        "code": "BoM",
        "name": "Bureau of Meteorology",
        "postalAddress": "Aspendale, Victoria 3195, Australia",
        "coordinates": "144.9486767,-37.8196597",
        "homepage": "http://www.bom.gov.au/"
      }
    ]
},
...

Any new registrations will require lon/lat details to be added, along with URLs etc

@taylor13 @momipsl

taylor13 commented 5 years ago

For the record, I've included a little more background information here (from @glevava):

In order to generate the ESGF map into a KML file I had to add information into the pyessv-archive
 for each CMIP6 institution_id.

For instance :
- pyessv consumes the WCRP JSON https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master
/CMIP6_institution_id.json where we have the pair
<---
    "AER":"Research and Climate Group, Atmospheric and Environmental Research, 131 Hartwell 
Avenue, Lexington, MA 02421, USA",
--->
- before recording the term into the pyessv data model we manually refactor the syntax and add
 useful information about the institution in this fashion (see https://github.com/ES-DOC/pyessv
/blob/master/sh/writers/wcrp/cmip6/institution_id.json) :
<---
    "code": "AER",
    "name": "Research and Climate Group, Atmospheric and Environmental Research",
    "postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
    "coordinates": "-71.270184,42.4624917",
    "homepage": "https://www.aer.com/",
    "consortia": []
--->

I manually checked all postal address through the Google API to get the lon/lat coordinates. I also
 tested the homepage and find it if not provided.
I also complete some consortia depending on their homepage if some partners where missing.

Would you be interested in adding this additional info into the WCRP JSON file? It would be great
 to have only one file providing all institution information and I'm happy if this is the WCRP one. 
Moreover, all other tools can benefit of this (outside of pyessv).
We could have a new CMIP6_instutition_id.json in the same way as the CMIP6_source_id.json:

<----
    "institution_id":{
        "AER":{
        "name": "Research and Climate Group, Atmospheric and Environmental Research",
            "postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
            "coordinates": "-71.270184,42.4624917",
            "homepage": "https://www.aer.com/",
            "consortia": []
    }
-->

Of course, the name of the attributes could be changed as you want. Do you know if such a new
 syntax could have side effects on other tools? I think it would be ok for CMOR/PrePARE.
If it sounds a good idea for you, I can provide the final JSON to be upgraded by Paul on the
 WCRP repo.

and a subsequent email:

> Regarding the coordinates attribute, how are you collecting this? If I have to revise or register a
 new institution, it would be useful for me to include this information at the time the details are
 registered, this will also ensure that if your google map generation is automated, the new institution 
info will propagate automagically.
First, I tried the Google API geocoding with allows you to recover the coordinates from a postal 
address (several python libraries implement it) but the Google API isn't free and without paid plan 
you can only request one address per day... : /
So I rollback to the manual way, I enter the postalAddress value into the Google Maps front-end, the 
result shows you the coordinates in the URL. For instance here is what I get for AER postal address :

https://www.google.com/maps/place/131+Hartwell+Ave,+Lexington,+MA+02421,+%C3%89tats-Unis/@42.4624917,-71.270184,17z/data=!3m1!4b1!4m5!3m4!1s0x89e39c1e38ce7b05:0x270c246beeace54a!8m2!3d42.4624917!4d-71.2679953

You can see the "lat/lon" in bold. Be careful that the coordinates have to be in "lon/lat" order in the 
JSON (and not "lat/lon"), that's the convention for the KML.

If the postal address was not precise enough (or maybe wrong) I investigate the homepage of the 
institution to get the contact address and compare all info to focus on the most appropriate place.
For consortia I only kept the acronym/code and the homepage if a dedicated one exists (no 
coordinates, address or full name by definition), but I list all participants within the "consortia" list 
using the same JSON attributes. This list is empty by default in the case of a unique institution 
(e.g., AER)
Sometime I widened the definition of a "consortia" to add or detail several partners. For instance, 
CNRM-CERFACS is a "consortia" composed by the CNRM and the CERFACS with their own address 
and coordinates. In the same way the IPSL can be seen as "consortia" of 4 partners involved in the 
CMIP6 exercice. Feel to do the same for other CMIP6 contributors.
taylor13 commented 5 years ago

Before making the proposed changes, we should check what software relies on CMIP6_institution_id.json. I think modifying it as proposed will break CMOR3, for example. Perhaps, we should keep CMIP6_institution_id.json as is and create CMIP6_institution.json, which would become the primary reference, and we could automatically generate CMIP6_institution_id.json whenever a change was made to the reference (since the id.json file would contain a subset of the information in the reference). This would maintain backward compatibility for all existing software.

durack1 commented 5 years ago

@taylor13 it doesn't make sense to me to have two files that contain the same information. As far as I can ascertain only ES-DOC and cmip6-cmor-tables are dependent on CMIP6_institution_id.json, we can control changes in cmip6-cmor-tables and will need to implement a change at the same time that CMIP6_institution_id.json is updated

MartinaSt commented 5 years ago

@taylor13 @durack1 I agree with Karl that the CV is used by several infrastructure components, including the citation by the way. As the CV is in JSON format, it is a small technical adjustment. However, if CMOR3 is effected and users have to update their CMOR3 version during post-processing, that would be quite a burden for the modeling centers.

In my opinion, we should keep the main structure of the institution_id JSON at this stage of the project and add institution_id e.g. at the end of the existing address. What do you think?

@davidhassell: Would that be ok for ES-DOC?

davidhassell commented 5 years ago

@MartinaSt do you mean something like: "CAS":"Chinese Academy of Sciences, Beijing 100029, China URL: http://......",? That sort of thing would be fine for ES-DOC.

MartinaSt commented 5 years ago

@davidhassell yes exactly. Not fancy but practical. In your example 'URL:' would be the separator for string splitting. Could you also check "EC-Earth-Consortium"? As I do not need to separate the URL, the suggestion how to handle this consortium (multi-institute) case, should comes from you. A snippet with a suggestion of the revised institution_id would be great.

davidhassell commented 5 years ago

If there is no appropriate URL, I suppose just leaving it blank would OK: "EC-Earth-Consortium":"AEMET, Spain; <snip> Norrkoping, Sweden URL:",

glevava commented 5 years ago

@taylor13 I agree with @durack1. We should really avoid multiplying CV sources with the same information, this is contrary to a CV rationalization toward we should move for a better data description. I don't think that changing the CMIP6_instution_id.json structure would impact ESGF tools and services deeper:

@MartinaSt Regarding the Data Citation service, I don't have any idea on how it consumes the WCRP JSON files to deal with CMIP6 CV?

Finally, I have two last comments:

durack1 commented 5 years ago

@glevava thanks for the input. Yes you're right, any changes to the CMIP6_CVs will not impact CMOR. As you noted we can amend the concatenation of information which is sourced from the CVs nightly and synced into the cmip6-cmor-tables/Tables/CMIP6_CV.json file. This CMIP6_CV.json file will retain the same simple dictionary structure for the CMIP6_institution_id, so CMOR is unaffected. As you note ES-DOC is also quarantined from changes, as this information is also rewritten. I am curious about the impact on the citation service, and wonder whether a synchronized change across these components, so making changes to the CMIP6_CVs, and the same day, updating the CMIP6_CV.json update for CMOR, in addition to the citation information would be possible.

Just so we're all on the same page, @glevava had already gone through the process of refactoring and augmenting the information, which is demoed above, see https://github.com/WCRP-CMIP/CMIP6_CVs/issues/379#issuecomment-470198365

taylor13 commented 5 years ago

@glevava thanks for summarizing the potential impacts. I had forgotten that CMOR doesn't read the CV's directly but it creates the input needed by CMOR nightly from the CVs, as @durack1 points out.

So the only remaining potential issue, I think, is how the proposed change might affect @MartinaSt ?

davidhassell commented 5 years ago

Just to note that on #705 I said "later" in the year was fine for this. However, I was overlooking the fact that the further_info_URLs already exist - with links to institutes that might not be correct. So, ES-DOC would be happy if this got resolved as soon as possible.

Thanks, David

MartinaSt commented 5 years ago

@glevava As written earlier I have to change reading the institution_id file but the users are not directly effected.

@durack1 : Please create an example institution_id file after the final decision on its new structure, let me and everyone else do our changes and - after the 'ready' from everyone - change the institution_id in the CMIP6_CVs. How does that sound to you?

glevava commented 5 years ago

I like the new structure suggested by @durack1 :

    "institution_id":{
        "AER":{
        "name": "Research and Climate Group, Atmospheric and Environmental Research",
            "postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
            "coordinates": "-71.270184,42.4624917",
            "homepage": "https://www.aer.com/",
            "consortia": []
    }

I would just convert consortia key into a dictionary of consortia partners dictionaries. Note that consortia partners don't have consortia key :

"RTE-RRTMGP-Consortium": {
    "name": "",
    "postalAddress": "c/o AER, 131 Hartwell Avenue, Lexington, MA 02421, USA",
    "coordinates": "",
    "homepage": "",
    "consortia": {
        "AER": {
            "name": "Research and Climate Group, Atmospheric and Environmental Research",
           "postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
           "coordinates": "-71.270184,42.4624917",
           "homepage": "https://www.aer.com/"
       },
      "UColorado": {
          "name": "University of Colorado",
          "postalAddress": "Boulder, CO 80309, USA",
          "coordinates": "-105.2681304,40.0075851",
          "homepage": "https://www.colorado.edu/"
      }
   }
}

@durack1 @taylor13 @MartinaSt What do you think?

martinjuckes commented 5 years ago

Have you considered using an ISNI reference? As a standard reference it is more stable than URLs (we have URLs for institutions involved in past CMIP activities on the IPCC DDC site, and they do require maintenance due to institutions changing name etc).

glevava commented 5 years ago

@martinjuckes That's a good idea. I didn't know about ISNI. Do you know if some CMIP6 contributor already have registered an ISNI? I'm not sure the IPSL have one for instance...

MartinaSt commented 5 years ago

@glevava @martinjuckes Yes, good idea, a PID for the institution is more persistent. There is a community driven new registry ROR (Research Organization Registry; https://ror.org/), as the future of ISNI is not totally clear to my information.

In case of IPSL there is a ROR and an ISNI found by the ROR search: https://ror.org/search?query=IPSL

durack1 commented 5 years ago

Hi folks, apologies for the radio silence from me I was off for a week. @MartinaSt @martinjuckes you note the ISNI and ROR registrations, I have not heard of these before. Are there steps required for institutes to register themselves, or is this done automatically?

As noted by @glevava the format noted above https://github.com/WCRP-CMIP/CMIP6_CVs/issues/379#issuecomment-495500941 is what we were thinking, potentially with some key name tweaks, but containing the same information - does this sound right to you both?

taylor13 commented 5 years ago

I'm not opposed to including an institution PID if the services are mature enough and the information they serve is correct. I note that some CMIP institutions are not included in ROR (e.g., PCMDI), and not all of the the institutions listed by ROR have an ISNI. Furthermore, when I click on the ISNI, the information provided is quite limited and sometimes undecipherable. For example, consider the query for CSIRO, Among the responses are "CSIRO Ocean and Atmosphere" and "CSIRO Marine and Atmospheric Research". Only the second one has an ISNI, and when you click on it, one of the items is "Titles" which is set to: "descriptions of new australian skates batoidea rajoidei", which seems like garbage. Is this service really ready for prime time?

Even if we decide to include a PID, I think we should include the URL for the institutions home page, since that is not given by the ISNI.

durack1 commented 5 years ago

@MartinaSt how are you dealing with institution registrations with the DOI/citation services?

glevava commented 5 years ago

I agree with @taylor13. Even if ROR/ISNI seems to be a very good initiative, only few institute have a proper ISNI registration to be used with CMIP6. May we recommend to have this for CMIP7 ;) ?

MartinaSt commented 5 years ago

I would recommend ROR over ISNI because it is community driven similar to ORCID. They promised that the entries can be managed by the institutions. But right ROR is fairly new and building up. I suggest to include it optional in the JSON. So no-one has to deliver an institute identifier but in case they have or will have till the end of CMIP6, this can be added without changing the JSON again.

durack1 commented 4 years ago

As the reformatting of CMIP6_institution_id.json will likely upset downstream users, we have opted to hold off on this change as it would impact a number of the processing streams, which include CMOR3. Once the CMIP6 dust settles a little, we can revisit this issue

durack1 commented 1 year ago

Closing, as this has been linked for consideration in https://github.com/WCRP-CMIP/CMIP6Plus_CVs/issues/9