DeNederlandscheBank / nqm

A Transformer-based Machine for answering questions on insurance companies
MIT License
0 stars 0 forks source link

Names and/or IDs not correctly translated #15

Closed jm-glowienke closed 3 years ago

jm-glowienke commented 3 years ago

First tests with IWSLT_en_de transformer model show generally good translation skills.

However, the names in the natural language question are not translated correctly: Either names are mixed up or even worse names are translated as a insurance ID. This happened for the query below, the id is also not matching Aegon. The only positive part is that at least the query is the correct one for a insurance ID, but the not the one desired.

Question: Where does Aegon N.V. operate? Translation:

SELECT DISTINCT ?a WHERE { 
  ?x eiopa-Base:hasEUCountryWhereEntityOperates ?a . 
  ?x eiopa-Base:hasInsuranceUndertakingID h0134 .}

Correct:

SELECT DISTINCT ?a WHERE {
    ?s gleif-L1:hasLegalName "Aegon N.V." .
    ?e gleif-Base:identifies ?s .
    ?e eiopa-Base:hasEUCountryWhereEntityOperates ?a .
}

Some ideas for fix:

jm-glowienke commented 3 years ago

Some quick testing using 'fairseq_testing.ipynb' shows that this is only a problem for some (possibly unknown) questions.

First implement testing process, then this issue might become irrelevant.

jm-glowienke commented 3 years ago

Tokenization was missing, this prevents names from being recognized, when in quotation marks. Names are case sensitive, only trained on small letter names --> capitalized name is replaced with