Closed cbm755 closed 2 years ago
re: ppc64le
, Wikipedia says:
ppc64le is a pure little-endian mode that has been introduced with the POWER8 as the prime target for technologies provided by the OpenPOWER Foundation, aiming at enabling porting of the x86 Linux-based software with minimal effort.
This seems to be causes by LANG=C
.
On my own non-PPC64le machine (x86, Fedora 35, Octave 6.4.0), I can reproduce a similar thing using:
$ export LANG=C
$ octave
>> pkg load doctest
>> doctest doctest
Doctest v0.7.0+: this is Free Software without warranty, see source.
error: regexprep: nothing to repeat at position 9 of expression
error: called from
doctest_collect>parse_texinfo at line 487 column 9
doctest_collect>extract_docstring at line 343 column 26
doctest_collect>collect_targets_function at line 257 column 36
doctest_collect at line 149 column 10
doctest at line 350 column 11
I wonder if we know about this before? We have utf-8 chars in our regexp, e.g., line 487 is:
L = regexprep (L, '^(\s*)(?:⇒|=>|⊣|-\||error→|error->)', '$1', 'once', 'lineanchors');
That does sound like A Bad Thing when the local is C
...
I can reproduce this on Fedora 35 and 36. So far, I cannot reproduce is on Ocfave containers (based on Ubuntu) using:
docker run -it --rm gnuoctave/octave:7.1.0 bash
export LANG=C
octave
pkg install -forge doctest
doctest doctest
Nor can I reproduce it on Ubuntu 20.04 on a real computer.
But I can reproduce it on a Ubuntu:22.04 container:
docker run -it --rm ubuntu:22.04
apt-get update
apt-get install --no-install-recommends octave octave-doctest
octave
pkg load doctest
doctest doctest
(interestingly, I do not need export LANG=C
here...
All the systems above using Octave <= 6.4.0. I have not reproduced this using 7.1.0 anywhere.
Workaround: put this before using doctest package
__mfile_encoding__ ('utf-8')
(it seems to work for me whether I do this before or after pkg load doctest
as long as I do it before doctest doctest
).
Edit, FTR:
docker run -it --rm ubuntu:22.04
apt-get update
apt-get install --no-install-recommends octave octave-doctest
octave
__mfile_encoding__ ('utf-8') # workaround
pkg load doctest
doctest doctest
For 7.1.0, its possible that DTRT here is to put .oct-config
with contents encoding=utf-8
. Although I think those are only supported on Octave >= 7 and I cannot reproduce this error there; its hard to tell if that fixes anything or not. But is does seem like The Right Thing!
On Octave 6.4.0 on Fedora 35/36, exporting LANG=C changes Octave's __locale_charset__
from UTF-8
to ANSI_X3.4-1968
.
Doing the same on the gnuoctave/octave:7.1.0
container DOES NOT change it:
podman run -it --rm gnuoctave/octave:7.1.0 bash
export LANG=C
octave
__locale_charset__
ans = UTF-8
and same with 6.3.0/6.4.0. So that explains (sort of) why I cannot reproduce on Ubuntu 20.04 (which 6.x.0 container images are based on)
Maybe LC_ALL
is also set?
Bingo! Now I can reproduce it on Ubuntu 20.04 (and the gnu-octave/octave
containers based on it)
docker run -it --rm gnuoctave/octave:6.4.0 bash
export LC_ALL=C
octave
>> __locale_charset__
ans = ANSI_X3.4-1968
>> pkg install -forge doctest
>> pkg load doctest
>> doctest doctest
Doctest v0.7.0: this is Free Software without warranty, see source.
error: regexprep: nothing to repeat at position 9 of expression
error: called from
doctest_collect>parse_texinfo at line 480 column 9
doctest_collect>extract_docstring at line 336 column 26
doctest_collect>collect_targets_function at line 250 column 36
doctest_collect at line 142 column 10
doctest at line 349 column 11
You might still need to set __mfile_encoding__ ('utf-8')
if you want to make sure that the files that you want to test (as opposed to the sources of doctest) are read as UTF-8. (Even after the changes in #252 are applied.)
That doesn't mean that #252 shouldn't be applied. IIUC, without that change, it would show that error even if the tested files only contained ASCII characters.
Edit: Similarly, you should set __mfile_encoding__ ('CP1252')
if you know that the files you'd like to test use that encoding. That would require #252 to work correctly though.
I'm leaning toward that being the user's problem... Unless we want to define that Doctest only reads utf-8 encoded files (I don't think we do).
I think we will want some unit tests of CP1252
encoded input working correctly.
We also need to document this or at least give some hints in help doctest
.
I haven't found it documented in upstream Octave, maybe it .oct-config
should be mentioned in help __mfile_encoding__
(although in the help of a hidden function seems not quite right...)
I haven't found it documented in upstream Octave, maybe it .oct-config should be mentioned in help __mfile_encoding__ (although in the help of a hidden function seems not quite right...)
It is mentioned in the documentation of dir_encoding
.
I got this doctesting Symbolic on Fedora 36. It did not happen when I build on upcoming Fedora 37.
This is Fedora's Octave 6.4.0 (versus Fedora 37 which has 7.1.0). The machine was "ppc64le" and I'm not sure what that is... but I recall we had problem in the past about regexp differences between x86 and arm, so maybe this is similar...
This was with current release doctest v0.7.0: would be nice to test with current
main
branch but I don't have shell access to the machine :(Upstream: https://koji.fedoraproject.org/koji/taskinfo?taskID=89344353 (not sure how long those logs are kept).