edsu / fondz

fondz is a tool for auto-generating an "archival description" from a bag or series of bags.
26 stars 3 forks source link

entity extraction #9

Open edsu opened 10 years ago

edsu commented 10 years ago

Might be interesting to optionally augment converted HTML with links to Wikipedia using Wikipedia Miner. Matched entities could then trigger retrieval of more metadata from Freebase and Dbpedia.

edsu commented 10 years ago

Or perhaps StanfordNER? See http://ianmilligan.ca/2014/02/06/visualizing-locations-in-the-internet-archive-ca-wide-scrape-sample/ and http://williamjturkel.net/2013/06/30/named-entity-recognition-with-command-line-tools-in-linux/