Raldir / FEVEROUS

Repository for Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), accepted to NeurIPS 2021 Dataset and Benchmarks and used for the FEVER Workshop Shared Task at EMNLP2021.
Apache License 2.0
67 stars 20 forks source link

Querying DB with texts with special symbols does not return results. #3

Closed narayanacharya6 closed 3 years ago

narayanacharya6 commented 3 years ago

I've noticed search queries with umlauts and other symbols also return nothing from the database while in fact there is a result in there.

For example: If I try to search for Didier Lourenço using db.get_doc_json() it returns nothing. But when I try to search the database directly using SELECT * FROM wiki WHERE id = 'Didier Lourenço'; it does return the correct result from the database.

The line under src/database/feverous_db.py to "normalize" the doc_id is commented out.

https://github.com/Raldir/FEVEROUS/blob/main/src/database/feverous_db.py#L47-L57

Using the "normalized" query seems to work. Any reason why this was commented out?

narayanacharya6 commented 3 years ago

Also, here is a list of queries that do not work even after using the "normalized" text. Maybe we need to handle these separately (or maybe just ignore them).

['Artim Šakiri', 'Kazys Škirpa', 'Škoda Auto Museum', 'ʻIʻiwi', 'Kristian Wåhlin', 'Kåre Harila', 'Špania Dolina', 'Themistokli Gërmenji', 'NK GOŠK Gabela', 'Škoda Fabia', 'Baháʼí Faith in Cameroon', 'Švitrigaila', 'Håkon Opdal', 'Pål Varhaug', 'Baháʼí Faith by country', 'İpek Soylu', 'Šance Dam', 'Šardinje', 'Kahoʻolawe', 'Maija Tīruma', 'Šentilj v Slovenskih Goricah', 'Baháʼí Faith', 'Roman Štrba', 'Vladimir Štimac', 'Baháʼí News', 'Tʼazur Company', 'Kwakwakaʼwakw']
narayanacharya6 commented 3 years ago

Whoops, just saw the only other issue on the repo that addresses this. Closing.

Raldir commented 3 years ago

Sorry for this issue. We will upload a fixed version (+ full training data) very soon.