ArchivesPortalEuropeFoundation / Topic-Detection

Using machine learning approaches for automatic topic detection in a multilingual environment
6 stars 0 forks source link

Setup a script that given text returned identified entities as wikidata IDs #75

Open fedenanni opened 2 years ago

fedenanni commented 2 years ago

Started looking into it, the most promising solution seems to be Flair, which covers 5 languages (EN, DE, FR, ES, NL) for named entity recognition. Then we could pass the output to wikidata to find the best match. Current issue is that Flair is incompatible with Torch 1.8, which is what we need in our pipeline: https://github.com/flairNLP/flair/issues/2137 (it needs torch 1.7).

fedenanni commented 2 years ago

Problem solved in bcbb5435313bd757e8467fd517fef43e07337bee

fedenanni commented 2 years ago

We have a first version in 665e997 which takes a string and a language (currently supported: EN, DE, FR, NL) and returns the URLS of the identified entities