GlobalNamesArchitecture / gnrd

Global Names Discovery
MIT License
15 stars 0 forks source link

GNRD is identifying people's names in text citation #31

Closed diatomsRcool closed 2 years ago

diatomsRcool commented 3 years ago
Refer to GNRD Dryad project: Name Package
Elsa 78
Paula 202
Gabriella 792
Lisa 792
Alfaro 170
Cullen 202
Plana 340
Idris 340
Barbosa 346
Rana 346
Garreta 802
Cano 802
Yamada 825
Barbosa 890
Theron 917
Berta 923
Moreno 949
Moreno 2
Vizcaino 39

More names (I can provide the data package number if needed)

dimus commented 3 years ago

It happens because these words are also valid names.

For example https://verifier.globalnames.org/api/v1/verifications/Tsukada

The same problem happens with some geographical entities like America for example.

To avoid false positives we need to study contexts and apply machine learning techniques I guess.

dimus commented 3 years ago

Added https://github.com/gnames/gnfinder/issues/62 for name finding algorithm. If I add the names to grey dictionary, it should help to hide such stand-alone names from results.