Closed narayanacharya6 closed 3 years ago
Also, here is a list of queries that do not work even after using the "normalized" text. Maybe we need to handle these separately (or maybe just ignore them).
['Artim Šakiri', 'Kazys Škirpa', 'Škoda Auto Museum', 'ʻIʻiwi', 'Kristian Wåhlin', 'Kåre Harila', 'Špania Dolina', 'Themistokli Gërmenji', 'NK GOŠK Gabela', 'Škoda Fabia', 'Baháʼí Faith in Cameroon', 'Švitrigaila', 'Håkon Opdal', 'Pål Varhaug', 'Baháʼí Faith by country', 'İpek Soylu', 'Šance Dam', 'Šardinje', 'Kahoʻolawe', 'Maija Tīruma', 'Šentilj v Slovenskih Goricah', 'Baháʼí Faith', 'Roman Štrba', 'Vladimir Štimac', 'Baháʼí News', 'Tʼazur Company', 'Kwakwakaʼwakw']
Whoops, just saw the only other issue on the repo that addresses this. Closing.
Sorry for this issue. We will upload a fixed version (+ full training data) very soon.
I've noticed search queries with umlauts and other symbols also return nothing from the database while in fact there is a result in there.
For example: If I try to search for
Didier Lourenço
usingdb.get_doc_json()
it returns nothing. But when I try to search the database directly usingSELECT * FROM wiki WHERE id = 'Didier Lourenço';
it does return the correct result from the database.The line under
src/database/feverous_db.py
to "normalize" thedoc_id
is commented out.https://github.com/Raldir/FEVEROUS/blob/main/src/database/feverous_db.py#L47-L57
Using the "normalized" query seems to work. Any reason why this was commented out?