I need to create a network with a set of edges that include a SAME_AS edge type and a NOT_SAME_AS edge type for entity resolution to serve as training data to enable @tanmoyio to proceed with training an entity resolution model in #3.
DBLP Datasets
DBLP is a database of scholarly research in computer science.
The datasets we use are the actual DBLP data and a set of labels for entity resolution of authors.
DBLP Training Data
I need to create a network with a set of edges that include a
SAME_AS
edge type and aNOT_SAME_AS
edge type for entity resolution to serve as training data to enable @tanmoyio to proceed with training an entity resolution model in #3.DBLP Datasets
DBLP is a database of scholarly research in computer science.
The datasets we use are the actual DBLP data and a set of labels for entity resolution of authors.
Note that there are additional labels available as XML that we haven't parsed yet at:
Collecting and Preparing the Training Data
The DBLP XML and the 50K ER labels are downloaded, parsed and transformed into a graph via
graphlet.dblp.__main__
via:See the example data at: https://gist.github.com/rjurney/5acad373d485272b5c1f4352b1dd0fc6