Open alanorth opened 4 years ago
Hi Alan, this sounds promising. I am happy to understand this further and seek synergies.
Yeah I think it's really promising. Their data is fantastic! I am trying to convince CGSpace and MELSpace to use it. Let's see how far we get... BTW, on CGSpace we have 5866 unique organizations/affiliations/funders, and 1515 (25.8%) of those match with ROR already.
Have you already checked the ones we have in CLARISA? https://clarisa.cgiar.org/swagger/index.html#/Institutions%20Lists/getAllInstitutionsUsingGET
We are, indeed, in the process of finalizing the alignment with MELSpace and will start with the institutions list of the Agresso of the Alliance between Bioversity and CIAT.
Happy to elaborate further.
@htobon yeah I've looked at CLARISA a few times. I raised some concerns about the data in 2019-10:
name
value in many recordsI looked again last week and the issues are still there.
Not to mention, Clarisa only has around 3,500 entries. ROR has 97,000, and their data is MUCH higher quality, with links to permanent identifiers in many other large public datasets, and proper support for multi-lingual names and acronyms, not to mention their API is open and they provide monthly data dumps. I would recommend everyone align / map to ROR at this point. Store a "ror_id" field where the value maps to ROR and keep your own where it doesn't...
If it helps, the data in ROR is still just the grid data, and there's some more metadata available from GRID (all cc0 except the geonames associated data which is CCBY)
Really happy to hear the data is useful for you and you're finding the data high quality, we've put a lot of work into it.
Thanks, @IanCal. ROR is easier to use because of the monthly JSON releases. GRID only releases an RDF file if I'm not mistaken. RDF is much more complex to parse. :P
@alanorth grid is in json, csv and RDF for the bulk in the releases on figshare, and a variety of formats (json, ttl, nt, etc) if you want to access individual records (pages are machine readable, using either content negotiation or changing the url - https://grid.ac/institutes/grid.5335.0.json) :)
We've got a help page for using the figshare api to access all versions (as the collection has a DOI as well as each individual release).
An update on this, as of July, 2021 GRID is being retired and RoR will pick up the maintenance and updating of the data set.
https://ror.org/blog/2021-07-12-ror-grid-the-way-forward/
We should amend CG Core docs to recommend RoR.
Sorry to jump into your comments -- I'm the new Technical Community Manager for ROR, and I'm happy to answer any questions you might have!
The Research Organization Repository (ROR) is a database with nearly 100,000 organizations originally seeded from the GRID.ac dataset. Their metadata is updated monthly and it includes links to FundRef (CrossRef), GRID.ac, Wikidata, Wikipedia, etc and even has multilingual aliases and acronyms. For example, see this API search for one of the precursor institutes to ILRI:
https://api.ror.org/organizations?affiliation=International+Livestock+Centre+for+Africa
I have been investigating using this for our sponsors/investors and institutional affiliations and I am really impressed. They provide an API, a monthly ror.json dump, and an OpenRefine reconciliation service. Also there seems to be a community feedback process where we can suggest new organizations, which I suspect will be very valuable to them with all the metadata we've collected in CGSpace, MELSpace, CLARISA, etc.
BTW I've also written a Python script called ror-lookup.py that will validate a text file of organizations against the ror.json dump (faster than the API of course).
Let me know what you think!