Open GoogleCodeExporter opened 9 years ago
This problem appears to be a bit of a phantom.
Not reproduced for me with the given file. (probably cleaned of illegal utf8
somewhere in the upload/download process.)
Looks like too many utf8 error messages were being issued for a single bad utf8
encoding. This is now fixed, but in a different way. Will still produce one
error message for each bad byte in the uft8 sequence.
Original comment by theraysm...@gmail.com
on 24 Apr 2014 at 9:17
This issue was closed by revision r1080.
Original comment by theraysm...@gmail.com
on 24 Apr 2014 at 9:18
After more testing I found that the UTF-8 reading error still exists, but only
appears with some fonts.
I suspect it's some weird memory corruption thing therefore, but haven't looked
into it more yet (I plan to soon).
In the meantime I'm attaching a (very minimal) example training_text.txt file,
that fails with the attached 'GFS Didot' font, but succeeds with the 'Linux
Libertine O' font.
Running this command: text2image --text training_text.txt --outputbase test
--font 'GFS Didot' --fonts_dir .
I get this output:
Initializing fontconfig
WARNING: Illegal UTF8 encountered
ERROR: Illegal UTF8 encountered.
Index 0 char = 0xffffff90
Index 1 char = 0xffffffbc
Index 2 char = 0xffffff90
Index 3 char = 0xa
ERROR: Illegal UTF8 encountered.
Index 0 char = 0xffffffbc
Index 1 char = 0xffffff90
Index 2 char = 0xa
ERROR: Illegal UTF8 encountered.
Index 0 char = 0xffffff90
Index 1 char = 0xa
WARNING: Dropped 1 uncovered characters
(process:25315): Pango-WARNING **: Invalid UTF-8 string passed to
pango_layout_set_text()
WARNING: Illegal UTF8 encountered
WARNING: Illegal UTF8 encountered
ERROR: Illegal UTF8 encountered.
Index 0 char = 0xffffffff
Index 1 char = 0xa
WARNING: Illegal UTF8 encountered
WARNING: Illegal UTF8 encountered
ERROR: Illegal UTF8 encountered.
Index 0 char = 0xffffffff
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 0 to file test.tif
Whereas this command: text2image --text training_text.txt --outputbase test
--font 'Linux Libertine O' --fonts_dir
/usr/share/fonts/opentype/linux-libertine/
Gives this output:
Initializing fontconfig
Rendered page 0 to file test.tif
I plan to investigate this more soon.
Original comment by nick.wh...@durham.ac.uk
on 16 Jun 2014 at 9:07
Attachments:
OK, I found a bug which was causing some bad errors. If HAVE_GETLINE is not
defined, and you have lines longer than BUFSIZ, they are cut short, potentially
in the middle of UTF-8 characters.
Attached is a patch that fixes that by replacing ReadLine() with a much simpler
Read() routine.
However, I still have failures unless my first patch in the initial comment is
applied. The string is read correctly, and then is corrupted by the
DropUncoveredChars() routine. Please do apply it; it isn't a phantom. Can you
still not reproduce it, with the first attached training_text.txt? If not, I
suspect it might be something that's compiler optimisation dependent, as it
involves rewriting character strings on the fly in a way that maybe GCC can
guess incorrectly about.
Original comment by nick.wh...@durham.ac.uk
on 18 Jun 2014 at 9:24
Attachments:
Original issue reported on code.google.com by
nick.wh...@durham.ac.uk
on 13 Mar 2014 at 8:01Attachments: