Closed jm-glowienke closed 3 years ago
Some quick testing using 'fairseq_testing.ipynb' shows that this is only a problem for some (possibly unknown) questions.
First implement testing process, then this issue might become irrelevant.
Tokenization was missing, this prevents names from being recognized, when in quotation marks.
Names are case sensitive, only trained on small letter names --> capitalized name is replaced with
First tests with IWSLT_en_de transformer model show generally good translation skills.
However, the names in the natural language question are not translated correctly: Either names are mixed up or even worse names are translated as a insurance ID. This happened for the query below, the id is also not matching Aegon. The only positive part is that at least the query is the correct one for a insurance ID, but the not the one desired.
Question:
Where does Aegon N.V. operate?
Translation:Correct:
Some ideas for fix:
More queries, more training?Maybe the adaptions to the database help (https://github.com/DeNederlandscheBank/nqm/issues/7), but don't think soChange names to a name "object", so "Aegon" becomes "name_aegon" and similiar "id_h1234"