andreaspacher / openeditors

Webscraping data about editors of scientific journals.
https://openeditors.ooir.org/
Creative Commons Zero v1.0 Universal
54 stars 11 forks source link

Some wrong ROR IDs #6

Open psmukhopadhyay opened 3 years ago

psmukhopadhyay commented 3 years ago

We have noticed a few wrong ROR IDs during our attempt to create a subset of India-specific results from the OpenEditor dataset (editors1_ror_and_countries.csv and editors2_ror_and_countries.csv).

The classic two cases as examples are as follows:

A) Indian Institute of Science, Bangalore: the corresponding records for this premier Indian institute show wrong ror IDs in all rows/records - https://ror.org/05j873a45 - This ror ID is actually for Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html)

B) Christian Medical College Vellore, Vellore, India: the corresponding records for this institute show wrong ror ID in all rows/records - https://ror.org/01vj9qy35 - This ror ID is actually for Christian Medical College, Ludhiana (another CMC in another city and state in India) (Website - http://cmcludhiana.in/medical_college/)

Possible reasons:

An API call to ROR database (in affiliation field) for Indian Institute of Science like - https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science - shows a few results (around 14) with following data in json format ++++++++++ {"number_of_results":10,"items":[{"substring":"Indian Institute of Science","score":0.92,"matching_type":"COMMON TERMS","chosen":true,"organization":{"id":"https://ror.org/05j873a45","name":"Indian Institute of Soil Science","email_address":null,"ip_addresses":[],"established":1988,"types":["Facility"],"relationships":[{"label":"Indian Council of Agricultural Research","type":"Parent","id":"https://ror.org/04fw54a43"}],"addresses":[{"lat":23.309722,"lng":77.403056,"state":null,"state_code":null,"city":"Bhopal","geonames_city":{"id":1275841,"city":"Bhopal","geonames_admin1":{"name":"Madhya Pradesh","id":1264542,"ascii_name":"Madhya Pradesh","code":"IN.35"},"geonames_admin2":{"name":"Bhopāl","id":1275842,"ascii_name":"Bhopal","code":"IN.35.444"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iiss.nic.in/index.html"],"aliases":[],"acronyms":["IISS"],"status":"active","wikipedia_url":"https://en.wikipedia.org/wiki/Indian_Institute_of_Soil_Science","labels":[{"label":"भाकृअनुप-भारतीय मृदा विज्ञान संस्थान","iso639":"hi"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0000 9288 3664"]},"Wikidata":{"preferred":null,"all":["Q18125957"]},"GRID":{"preferred":"grid.464869.1","all":"grid.464869.1"}}}},{"substring":"Indian Institute of Science","score":0.84,"matching_type":"PHRASE","chosen":false,"organization":{"id":"https://ror.org/04dese585","name":"Indian Institute of Science Bangalore","email_address":null,"ip_addresses":[],"established":1909,"types":["Education"],"relationships":[],"addresses":[{"lat":13.021275,"lng":77.565769,"state":null,"state_code":null,"city":"Bengaluru","geonames_city":{"id":1277333,"city":"Bengaluru","geonames_admin1":{"name":"Karnataka","id":1267701,"ascii_name":"Karnataka","code":"IN.19"},"geonames_admin2":{"name":"Bangalore Urban","id":1277331,"ascii_name":"Bangalore Urban","code":"IN.19.572"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iisc.ernet.in/"],"aliases":[],"acronyms":["IISc"],"status":"active","wikipedia_url":"http://en.wikipedia.org/wiki/Indian_Institute_of_Science","labels":[{"label":"ఇండియన్ ఇన్ స్టిట్యూట్ ఆఫ్ సైన్స్","iso639":"te"},{"label":"இந்திய அறிவியல் கழகம்","iso639":"ta"},{"label":"ਭਾਰਤੀ ਵਿਗਿਆਨ ਅਦਾਰਾ","iso639":"pa"},{"label":"ഇന്ത്യൻ ഇൻസ്റ്റിറ്റ്യൂട്ട് ഓഫ് സയൻസ്","iso639":"ml"},{"label":"ಭಾರತೀಯ ವಿಜ್ಞಾನ ಸಂಸ್ಥೆ","iso639":"kn"},{"label":"भारतीय विज्ञान संस्थान","iso639":"hi"},{"label":"ભારતીય વિજ્ઞાન સંસ્થા","iso639":"gu"},{"label":"ভারতীয় বিজ্ঞান সংস্থা","iso639":"bn"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0001 0482 5067"]},"FundRef":{"preferred":"100007780","all":["100007780","100007871","100008044","100009935"]},"OrgRef":{"preferred":null,"all":["37533"]},"Wikidata":{"preferred":null,"all":["Q948720"]},"GRID":{"preferred":"grid.34980.36","all":"grid.34980.36"}}}},........ ++++++++++++++++++++

We can easily understand now that what is the reason for wrong ror ID in this case. The first one i.e Indian Institute of Soil Science has been picked up the process. In fact we have also observed that to be on the safe side score=1.0 is a better condition than chosen==true for extracting ror IDs through API call (but I am not quite sure that you have also adopted API path for ror ID or you are fetching ror IDs through some other means).

We found a total of 455 records (India-specific only) initially with wrong ror IDs in a total of 8170 records having ror IDs (out of 10316 records with affiliated country as India).

I am attaching a csv file containing these 455 records (rorORI column is the ror ID as available in the dataset and rorOEM is the corrected ror ID as fetched for our subset of data)

no-match-report.csv

ml4rrieu commented 2 years ago

I also found some problems here (Paris, France) with the ROR matching : Université de Paris-XII, France, Europe associated to https://ror.org/05f82e368 University of Paris associated to https://ror.org/05f82e368

whereas this are two different universities. (you can find 13 univ. at Paris starting with "univ of paris" but ending differently).

(Kudos for the tools and data ! really appareciate).

amandafrench commented 2 years ago

Hi @psmukhopadhyay and @ml4rrieu and @andreaspacher -- apologies for belated commenting and out of the blue tagging! I'm the new Technical Community Manager for ROR, and we're beta testing some improvements to our API's ?affiliation matching parameter that I think would help the issues listed here. Also figured it'd be wise to give users a heads up about the forthcoming changes in any case.

One of many changes is that we've removed a lot of false positives. See for instance the difference between the same search on the production server and the staging server, where we're beta testing the changes:

https://api.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

https://api.staging.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

The request for feedback and link to more documentation and examples is at https://github.com/ror-community/ror-roadmap/discussions/77 -- let us know what you think!

psmukhopadhyay commented 2 years ago

Thanks Amanda.

It is now much improved, and I tested it already the moment you posted this news in @.***

I'll post in the ror forum in case any further issues arise.

Best regards

On Thu, Sep 8, 2022 at 9:15 PM Amanda French @.***> wrote:

Hi @psmukhopadhyay https://github.com/psmukhopadhyay and @ml4rrieu https://github.com/ml4rrieu and @andreaspacher https://github.com/andreaspacher -- apologies for belated commenting and out of the blue tagging! I'm the new Technical Community Manager for ROR, and we're beta testing some improvements to our API's ?affiliation matching parameter that I think would help the issues listed here. Also figured it'd be wise to give users a heads up about the forthcoming changes in any case.

One of many changes is that we've removed a lot of false positives. See for instance the difference between the same search on the production server and the staging server, where we're beta testing the changes:

https://api.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

https://api.staging.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

The request for feedback and link to more documentation and examples is at ror-community/ror-roadmap#77 https://github.com/ror-community/ror-roadmap/discussions/77 -- let us know what you think!

— Reply to this email directly, view it on GitHub https://github.com/andreaspacher/openeditors/issues/6#issuecomment-1240897264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AET2TVBKZB43G5OR2WSKYK3V5ICZNANCNFSM43RWSDDA . You are receiving this because you were mentioned.Message ID: @.***>