Graphlet-AI / graphlet

PyPi module for Graphlet AI Knowledge Graph Factory
https://graphlet.ai
Apache License 2.0
28 stars 1 forks source link

Create a DBLP labeled training network with SAME_AS edges for training our entity resolution model #10

Closed rjurney closed 1 year ago

rjurney commented 2 years ago

DBLP Training Data

I need to create a network with a set of edges that include a SAME_AS edge type and a NOT_SAME_AS edge type for entity resolution to serve as training data to enable @tanmoyio to proceed with training an entity resolution model in #3.

DBLP Datasets

DBLP is a database of scholarly research in computer science.

The datasets we use are the actual DBLP data and a set of labels for entity resolution of authors.

Note that there are additional labels available as XML that we haven't parsed yet at:

Collecting and Preparing the Training Data

The DBLP XML and the 50K ER labels are downloaded, parsed and transformed into a graph via graphlet.dblp.__main__ via:

python -m graphlet.dblp

See the example data at: https://gist.github.com/rjurney/5acad373d485272b5c1f4352b1dd0fc6