JonathanReeve / data-ethics-literature-review

An automated survey of literature and curricula surrounding ethics in data science. WIP.
http://data-ethics.tech
GNU General Public License v3.0
1 stars 1 forks source link

Deduplicate universities with multiple listings in courses.ttl #16

Open JonathanReeve opened 3 years ago

JonathanReeve commented 3 years ago

There are some universities which are listed twice, or more times: "Massachusetts Institute of Technology" and "MIT."

One thing that might help is: with luck they're both resolved to the same Wikidata entity. So If "Massachusetts Institute of Technology" and "MIT" both point to the same Wikidata page, we should be able to recognize this, and change "MIT" so that it points to "Massachusetts Institute of Technology" instead.

sy2657 commented 3 years ago

I will try to work on this issue

JonathanReeve commented 3 years ago

Sounds good! The way I would approach this is to write a SPARQL query that says something like this: (untested)

select distinct ?uniName where { 
  ?uni owl:sameAs ?wikidataEntity
  ?uni ccso:legalName ?uniName
}

And then just manually edit the file, replacing the names with the best or longest name.