ec-geolink / design

Design information about the EarthCube Geolink project.
8 stars 1 forks source link

dataone: improve name matching to NSF names #25

Open mbjones opened 9 years ago

amoeba commented 9 years ago

Matching of names to NSF names is part of a larger piece of work that has to do with generating a set of LOD URIs for people and organizations. An initial attempt at a script to do match names from one source to another is at #45.

See #39 for a bit of a description of this issue.

amoeba commented 8 years ago

This is an old issue so I'm updating it with where things have gotten.

The NSF awards database does not appear to contain email addresses for people. @narock is this true? I'm basing this off of what I've seen on data.geolink.org and in the RDF dump at http://data.geolink.org/datasets/NsfPeople.ttl.

Within DataOne, our primary rule for matching people relies on being able to match an email address. Given that lack of email addresses in the NSF people dump, we would not be able to make such matches between DataOne people and NSF people. What we do about this?

narock commented 8 years ago

Yes, this is true of the current RDF data. However, I raised this issue on the last GeoLink telecon and everyone agreed that we needed a hasEmailAddress property in the Person class in main.owl. I am currently re-creating the NSF and AGU data to include the email address. Expected to be completed by end of this week. Your harvester should see new data by Friday.

krisnadhi commented 8 years ago

I thought there was a privacy concern regarding exposing emails?

narock commented 8 years ago

That was the original thought. But, at our last telecon we decided it wasn't a privacy concern as data centers, NSF, and AGU already publicly expose email. I believe the consensus was to add hasEmailAddress to main.owl

krisnadhi commented 8 years ago

Ah okay. Will update the GBO to reflect this then.

krisnadhi commented 8 years ago

Btw, regarding matching people, I believe we (Wright State) will also do it through coreference resolution.