Document mfile encoding issues and workaround

gnu-octave / octave-doctest

Doctests for Octave/Matlab

https://gnu-octave.github.io/packages/doctest/

BSD 3-Clause "New" or "Revised" License

16 stars 4 forks source link

Document mfile encoding issues and workaround #254

Closed cbm755 closed 1 year ago

cbm755 commented 2 years ago

See #251. I think we maybe need to document things a bit, probably in help doctest, probably pointing folks at __mfile_encoding and dir_encoding and .oct-config files...

.oct-config files may need some work doc upstream: I have not yet found a reference to that. Perhaps we can add a little bit about encoding to https://docs.octave.org/latest/Function-Files.html

cbm755 commented 2 years ago

These new docs should @pxref{dir_encoding}.

mmuetzel commented 2 years ago

Regarding recommending .oct-config files to downstream: That is probably a good idea in general. But I'm not sure if it would solve all possible issues here. The setting in that file only applies to how .m files are parsed by the interpreter (see documentation of dir_encoding). Afaict, it doesn't change how files are read with functions like fread, textread, ... I haven't looked into how doctest gets to the docstrings. But I'd guess that it involves some form of reading those files from the disc in text mode. Afaict, that uses the "global" __mfile_encoding__ by default (not the folder local dir_encoding). You could probably try to automate that somewhat in doctest by querying dir_encoding on the folder containing the file to be read and setting the respective encoding with fopen. In that case, the encoding specified in .oct-config files would also apply to the tested docstrings semi-automatically.

cbm755 commented 2 years ago

I think in most cases we call builtin get_help_text.

I just took a quick look and the only time I can see that we do raw fileread is on texinfo input (like a pure .texinfo file, maybe we can ignore that case for time being: I can file a new issue). In functions, classes, oct-files etc we use builtin functions.

mmuetzel commented 2 years ago

I made a quick test with an .m file encoded in ISO 8859-1 and an .oct-config file that contains encoding=iso-8859-1. get_help_text returned non-ASCII characters correctly converted. So, that seems to work correctly already. 🎉

cbm755 commented 2 years ago

Can you do a quick PR for that? Just put the two files in a subdir of tests, maybe "test/non_utf8_mfile". I can edit, but I think then "make test" should work.

mmuetzel commented 2 years ago

I opened #256 that adds a test.

cbm755 commented 1 year ago

Added to main help in 8495ba4481c167ddb75c4d932f0da8f202a24c9c