Open trondtynnol opened 1 month ago
I should note I have fixed the conversion error that caused this XML for rus-sjd, so it is no longer a problem there. I still do suspect it may be something that we should look into if we have the time, but it is not high priority.
sanj.gtdict-02.uit.no (the new server) looked like your screenshot with the empty 2. for me. sanj.oahpa.no looked good. Locally, it also looks good.
The giella-core/dicts/scripts/merge_giella_dicts.py
script just merges the <e>
elements, without doing any other checks. The nds compile project
command just runs merge_giella_dicts.
I am able to reproduce the error if I insert an <e>
in a dictionary, that contains nothing. I get a blank page in NDS, and see this error.
The error actually stems from the debugging code. There is a line etree.tostring(e, pretty_print=True, encoding="utf-8")
, which is not a python str
, but a bytes
. Hence the error about not being able to concatenate bytes to strings.
Fixing the issue (by doing a .decode("utf-8")
to turn it into a string), makes the code work as intended. The error message is still printed about something being wrong in the .xml, which is okay - it really is missing text in the
It looks like this:
I don't really know if that is preferable to just showing a blank screen, or blank entry... This should have been fixed by the dictionary author(s).
Commit 4a90e6d3cf0a91d1623f35316fc272748b6dcf27 fixes the issue in the error reporting, but does not address how empty <t>
nodes are displayed in any way. Again, I think this should be on the dictionary authors.
sanj.gtdict-02.uit.no (the new server) looked like your screenshot with the empty 2. for me. sanj.oahpa.no looked good. Locally, it also looks good.
Yeah, I've fixed the source file for rus-sjd and updated on the old server, so that's why it's working now.
The
giella-core/dicts/scripts/merge_giella_dicts.py
script just merges the<e>
elements, without doing any other checks. Thends compile project
command just runs merge_giella_dicts.
Yeah, this was a custom script (xlsx to xml), so that was the source of this
The error actually stems from the debugging code. There is a line
etree.tostring(e, pretty_print=True, encoding="utf-8")
, which is not a pythonstr
, but abytes
. Hence the error about not being able to concatenate bytes to strings.Fixing the issue (by doing a
.decode("utf-8")
to turn it into a string), makes the code work as intended. The error message is still printed about something being wrong in the .xml, which is okay - it really is missing text in the node.
That sounds great.
I don't really know if that is preferable to just showing a blank screen, or blank entry... This should have been fixed by the dictionary author(s).
That would of course be preferable, but we have so many dictionaries with various small mistakes and no maintainer, so it would be best if NDS accepts minor "mistakes". Having an empty t-node is also sometimes needed if you want to display a paradigm for a word for which there is yet no translation due to the way NDS works.
Seems that this could be closed now if you agree, @Phaqui
In sanj, when searching for
варежка
, at least the first time after a restart, NDS is unable to parse the entry and gives an empty result:When running this locally, the following error was produced:
Which is a bit hard to read with all the encoded cyrillic, se here it is decoded:
I have a hunch that the cause is the empty t-node in this entry (probably an error in the conversion script, but NDS should in principle handle empty t-nodes):
On subsequent searches, however, NDS manages to do this search and produces the expected result: