Closed esdoc-system-user closed 1 year ago
Thanks @esdoc-system-user, what use case has prompted this suggestion? It would be great to know how this would be used, as currently the only package that we're actively considering is CMOR
@esdoc-system-user there doesn't appear to be a valid use case here, so will close. Please reopen if required
Following work by @glevava we'll update the institution_id template to include additional details following the example:
...
{
"code": "CSIRO-ARCCSS-BoM",
"name": "",
"postalAddress": "",
"coordinates": "",
"homepage": "",
"consortia": [
{
"code": "CSIRO",
"name": "Commonwealth Scientific and Industrial Research Organisation",
"postalAddress": "Aspendale, Victoria 3195, Australia",
"coordinates": "145.0435415,-38.0028413",
"homepage": "https://www.csiro.au/"
},
{
"code": "ARCCSS",
"name": "Australian Research Council - Centre of Excellence for Climate System Science",
"postalAddress": "UNSW Sydney NSW 2052, Australia",
"coordinates": "151.2277251,-33.9166077",
"homepage": "https://www.climatescience.org.au/tags/arccss"
},
{
"code": "BoM",
"name": "Bureau of Meteorology",
"postalAddress": "Aspendale, Victoria 3195, Australia",
"coordinates": "144.9486767,-37.8196597",
"homepage": "http://www.bom.gov.au/"
}
]
},
...
Any new registrations will require lon/lat details to be added, along with URLs etc
@taylor13 @momipsl
For the record, I've included a little more background information here (from @glevava):
In order to generate the ESGF map into a KML file I had to add information into the pyessv-archive
for each CMIP6 institution_id.
For instance :
- pyessv consumes the WCRP JSON https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master
/CMIP6_institution_id.json where we have the pair
<---
"AER":"Research and Climate Group, Atmospheric and Environmental Research, 131 Hartwell
Avenue, Lexington, MA 02421, USA",
--->
- before recording the term into the pyessv data model we manually refactor the syntax and add
useful information about the institution in this fashion (see https://github.com/ES-DOC/pyessv
/blob/master/sh/writers/wcrp/cmip6/institution_id.json) :
<---
"code": "AER",
"name": "Research and Climate Group, Atmospheric and Environmental Research",
"postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
"coordinates": "-71.270184,42.4624917",
"homepage": "https://www.aer.com/",
"consortia": []
--->
I manually checked all postal address through the Google API to get the lon/lat coordinates. I also
tested the homepage and find it if not provided.
I also complete some consortia depending on their homepage if some partners where missing.
Would you be interested in adding this additional info into the WCRP JSON file? It would be great
to have only one file providing all institution information and I'm happy if this is the WCRP one.
Moreover, all other tools can benefit of this (outside of pyessv).
We could have a new CMIP6_instutition_id.json in the same way as the CMIP6_source_id.json:
<----
"institution_id":{
"AER":{
"name": "Research and Climate Group, Atmospheric and Environmental Research",
"postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
"coordinates": "-71.270184,42.4624917",
"homepage": "https://www.aer.com/",
"consortia": []
}
-->
Of course, the name of the attributes could be changed as you want. Do you know if such a new
syntax could have side effects on other tools? I think it would be ok for CMOR/PrePARE.
If it sounds a good idea for you, I can provide the final JSON to be upgraded by Paul on the
WCRP repo.
and a subsequent email:
> Regarding the coordinates attribute, how are you collecting this? If I have to revise or register a
new institution, it would be useful for me to include this information at the time the details are
registered, this will also ensure that if your google map generation is automated, the new institution
info will propagate automagically.
First, I tried the Google API geocoding with allows you to recover the coordinates from a postal
address (several python libraries implement it) but the Google API isn't free and without paid plan
you can only request one address per day... : /
So I rollback to the manual way, I enter the postalAddress value into the Google Maps front-end, the
result shows you the coordinates in the URL. For instance here is what I get for AER postal address :
https://www.google.com/maps/place/131+Hartwell+Ave,+Lexington,+MA+02421,+%C3%89tats-Unis/@42.4624917,-71.270184,17z/data=!3m1!4b1!4m5!3m4!1s0x89e39c1e38ce7b05:0x270c246beeace54a!8m2!3d42.4624917!4d-71.2679953
You can see the "lat/lon" in bold. Be careful that the coordinates have to be in "lon/lat" order in the
JSON (and not "lat/lon"), that's the convention for the KML.
If the postal address was not precise enough (or maybe wrong) I investigate the homepage of the
institution to get the contact address and compare all info to focus on the most appropriate place.
For consortia I only kept the acronym/code and the homepage if a dedicated one exists (no
coordinates, address or full name by definition), but I list all participants within the "consortia" list
using the same JSON attributes. This list is empty by default in the case of a unique institution
(e.g., AER)
Sometime I widened the definition of a "consortia" to add or detail several partners. For instance,
CNRM-CERFACS is a "consortia" composed by the CNRM and the CERFACS with their own address
and coordinates. In the same way the IPSL can be seen as "consortia" of 4 partners involved in the
CMIP6 exercice. Feel to do the same for other CMIP6 contributors.
Before making the proposed changes, we should check what software relies on CMIP6_institution_id.json. I think modifying it as proposed will break CMOR3, for example. Perhaps, we should keep CMIP6_institution_id.json as is and create CMIP6_institution.json, which would become the primary reference, and we could automatically generate CMIP6_institution_id.json whenever a change was made to the reference (since the id.json file would contain a subset of the information in the reference). This would maintain backward compatibility for all existing software.
@taylor13 it doesn't make sense to me to have two files that contain the same information. As far as I can ascertain only ES-DOC and cmip6-cmor-tables
are dependent on CMIP6_institution_id.json
, we can control changes in cmip6-cmor-tables
and will need to implement a change at the same time that CMIP6_institution_id.json
is updated
@taylor13 @durack1 I agree with Karl that the CV is used by several infrastructure components, including the citation by the way. As the CV is in JSON format, it is a small technical adjustment. However, if CMOR3 is effected and users have to update their CMOR3 version during post-processing, that would be quite a burden for the modeling centers.
In my opinion, we should keep the main structure of the institution_id JSON at this stage of the project and add institution_id e.g. at the end of the existing address. What do you think?
@davidhassell: Would that be ok for ES-DOC?
@MartinaSt do you mean something like: "CAS":"Chinese Academy of Sciences, Beijing 100029, China URL: http://......",
? That sort of thing would be fine for ES-DOC.
@davidhassell yes exactly. Not fancy but practical. In your example 'URL:'
would be the separator for string splitting. Could you also check "EC-Earth-Consortium"
? As I do not need to separate the URL, the suggestion how to handle this consortium (multi-institute) case, should comes from you. A snippet with a suggestion of the revised institution_id would be great.
If there is no appropriate URL, I suppose just leaving it blank would OK: "EC-Earth-Consortium":"AEMET, Spain; <snip> Norrkoping, Sweden URL:",
@taylor13
I agree with @durack1. We should really avoid multiplying CV sources with the same information, this is contrary to a CV rationalization toward we should move for a better data description.
I don't think that changing the CMIP6_instution_id.json
structure would impact ESGF tools and services deeper:
CMIP6_CV.json
which is a "raw" concatenation of most of the WCRP JSON files (including CMIP6_institution_id.json
). I think this concatenation process can be easily modify to keep the current structure in the CMIP6_CV.json
read by CMOR. No need to have a new CMOR release (fortunately!).pyessv
library which consumes the WCRP json and already refactor the current CMIP6_instution_id.json
with the above suggested structure (including code, verified postal addresses, checked home pages, consortia members when exists, etc.). So any services relying on pyessv
would be completely insensible to a new WCRP JSON structure.esg.cmip6.ini
that is generated using a python script consuming the WCRP JSON files. I can easily adapt the script to a new structure of the CMIP6_institution_id.json
.@MartinaSt Regarding the Data Citation service, I don't have any idea on how it consumes the WCRP JSON files to deal with CMIP6 CV?
Finally, I have two last comments:
CMIP6_institution_id.json
includes some inaccurate postal addresses and broken urls, I faced when I built the Google Map of the CMIP6 contributor. I corrected them in the new JSON structure I propose to use.@glevava thanks for the input. Yes you're right, any changes to the CMIP6_CVs
will not impact CMOR. As you noted we can amend the concatenation of information which is sourced from the CVs nightly and synced into the cmip6-cmor-tables/Tables/CMIP6_CV.json file. This CMIP6_CV.json file will retain the same simple dictionary structure for the CMIP6_institution_id, so CMOR is unaffected. As you note ES-DOC is also quarantined from changes, as this information is also rewritten. I am curious about the impact on the citation service, and wonder whether a synchronized change across these components, so making changes to the CMIP6_CVs, and the same day, updating the CMIP6_CV.json update for CMOR, in addition to the citation information would be possible.
Just so we're all on the same page, @glevava had already gone through the process of refactoring and augmenting the information, which is demoed above, see https://github.com/WCRP-CMIP/CMIP6_CVs/issues/379#issuecomment-470198365
@glevava thanks for summarizing the potential impacts. I had forgotten that CMOR doesn't read the CV's directly but it creates the input needed by CMOR nightly from the CVs, as @durack1 points out.
So the only remaining potential issue, I think, is how the proposed change might affect @MartinaSt ?
Just to note that on #705 I said "later" in the year was fine for this. However, I was overlooking the fact that the further_info_URLs already exist - with links to institutes that might not be correct. So, ES-DOC would be happy if this got resolved as soon as possible.
Thanks, David
@glevava As written earlier I have to change reading the institution_id file but the users are not directly effected.
@durack1 : Please create an example institution_id file after the final decision on its new structure, let me and everyone else do our changes and - after the 'ready' from everyone - change the institution_id in the CMIP6_CVs. How does that sound to you?
I like the new structure suggested by @durack1 :
"institution_id":{
"AER":{
"name": "Research and Climate Group, Atmospheric and Environmental Research",
"postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
"coordinates": "-71.270184,42.4624917",
"homepage": "https://www.aer.com/",
"consortia": []
}
I would just convert consortia
key into a dictionary of consortia partners dictionaries. Note that consortia partners don't have consortia
key :
"RTE-RRTMGP-Consortium": {
"name": "",
"postalAddress": "c/o AER, 131 Hartwell Avenue, Lexington, MA 02421, USA",
"coordinates": "",
"homepage": "",
"consortia": {
"AER": {
"name": "Research and Climate Group, Atmospheric and Environmental Research",
"postalAddress": "131 Hartwell Avenue, Lexington, MA 02421, USA",
"coordinates": "-71.270184,42.4624917",
"homepage": "https://www.aer.com/"
},
"UColorado": {
"name": "University of Colorado",
"postalAddress": "Boulder, CO 80309, USA",
"coordinates": "-105.2681304,40.0075851",
"homepage": "https://www.colorado.edu/"
}
}
}
@durack1 @taylor13 @MartinaSt What do you think?
Have you considered using an ISNI reference? As a standard reference it is more stable than URLs (we have URLs for institutions involved in past CMIP activities on the IPCC DDC site, and they do require maintenance due to institutions changing name etc).
@martinjuckes That's a good idea. I didn't know about ISNI. Do you know if some CMIP6 contributor already have registered an ISNI? I'm not sure the IPSL have one for instance...
@glevava @martinjuckes Yes, good idea, a PID for the institution is more persistent. There is a community driven new registry ROR (Research Organization Registry; https://ror.org/), as the future of ISNI is not totally clear to my information.
In case of IPSL there is a ROR and an ISNI found by the ROR search: https://ror.org/search?query=IPSL
Hi folks, apologies for the radio silence from me I was off for a week. @MartinaSt @martinjuckes you note the ISNI and ROR registrations, I have not heard of these before. Are there steps required for institutes to register themselves, or is this done automatically?
As noted by @glevava the format noted above https://github.com/WCRP-CMIP/CMIP6_CVs/issues/379#issuecomment-495500941 is what we were thinking, potentially with some key name tweaks, but containing the same information - does this sound right to you both?
I'm not opposed to including an institution PID if the services are mature enough and the information they serve is correct. I note that some CMIP institutions are not included in ROR (e.g., PCMDI), and not all of the the institutions listed by ROR have an ISNI. Furthermore, when I click on the ISNI, the information provided is quite limited and sometimes undecipherable. For example, consider the query for CSIRO, Among the responses are "CSIRO Ocean and Atmosphere" and "CSIRO Marine and Atmospheric Research". Only the second one has an ISNI, and when you click on it, one of the items is "Titles" which is set to: "descriptions of new australian skates batoidea rajoidei", which seems like garbage. Is this service really ready for prime time?
Even if we decide to include a PID, I think we should include the URL for the institutions home page, since that is not given by the ISNI.
@MartinaSt how are you dealing with institution registrations with the DOI/citation services?
I agree with @taylor13. Even if ROR/ISNI seems to be a very good initiative, only few institute have a proper ISNI registration to be used with CMIP6. May we recommend to have this for CMIP7 ;) ?
I would recommend ROR over ISNI because it is community driven similar to ORCID. They promised that the entries can be managed by the institutions. But right ROR is fairly new and building up. I suggest to include it optional in the JSON. So no-one has to deliver an institute identifier but in case they have or will have till the end of CMIP6, this can be added without changing the JSON again.
As the reformatting of CMIP6_institution_id.json
will likely upset downstream users, we have opted to hold off on this change as it would impact a number of the processing streams, which include CMOR3. Once the CMIP6 dust settles a little, we can revisit this issue
Closing, as this has been linked for consideration in https://github.com/WCRP-CMIP/CMIP6Plus_CVs/issues/9
Institutional homepage should be added to institution_id.