Closed xrotwang closed 5 years ago
The problem seems to be that the IDs in id_index
are not updated to the newly generated ones.
Ok, the ID index is created after new IDs have been assigned - so that's correct. But the original IDs should incorporate homonym numbers. Otherwise they are not unique - so id_index
won't be injective. E.g. when processing palula, there are two lexemes bíi
. Because they contain non-ASCII characters, a new ID assigned to them (but for both new IDs bíi
is kept as "original ID"):
bíi
LX000404
bíi
LX000405
this results in mapping bíi
to the last new ID assinged to this original ID in id_index
.
Also, variants with homonym numbers are not added to id_index
as expected - but I don't know why that happens, yet.
Currently, it seems cross-references using homonym numbers do not work - e.g. for Palula.