ArchivesPortalEuropeFoundation / Topic-Detection

Using machine learning approaches for automatic topic detection in a multilingual environment
6 stars 0 forks source link

No results for entity search Napoleon #51

Open kerstarno opened 2 years ago

kerstarno commented 2 years ago

When I search for "Napoleon" with settings "de", "entity", "100", and no checkboxes checked, I only get the message "Mentions of "Napoleon" not found in corpus!".

The same with "Napoléon" and "fr" (all other settings as above).

When I search with "Napoleon" and "en" (all other settings as above), however, I get results. The Wikidata and VIAF links also point to the intended entity in this case.

I've also tried "Napoleone" and "it" and "Napoléon" and "es", but neither gives results - or rather also results in the error message.

All tests done between 9.10 and 9.20am CET on 15 November 2021 in development environment.

Using current version of Chrome on (an older) Mac (Catalina, 10.15.7).

fedenanni commented 2 years ago

These are the name variations and identified entity when i search for Napoleon in English: {'Bonaparte', 'Napoleone I', 'Napoléon', 'Napoléon Bonaparte', 'Napoléon Ier', 'Napoleon', 'Buonaparte', 'Napoleone di Buonaparte', 'Napoléon I'} https://en.wikipedia.org/wiki/Napoleon

For German and other languages it seems that the problem is that I hit a redirect page, I get: https://de.wikipedia.org/wiki/Napoleon but the correct one, which Wikipedia automatically fixes but the library I use doesn't, is: https://de.wikipedia.org/wiki/Napoleon_Bonaparte

fedenanni commented 2 years ago

I have added (in 7cc0d0a728266c8481e80a6706ddb1143923df42) a check for redirects on top in case we hit this error, which might slow down things further, but should help us finding mentions.