Closed fosskers closed 1 week ago
Yes, a base-string is an array of base-chars, and base-chars are one byte, Latin-1 or something. The coercion is done here: https://github.com/clasp-developers/clasp/blob/9585462edb338550ecc59154031d48c25ecaa87f/src/core/pathname.cc#L1349-L1351 This code is pretty old, dating back to when Unicode support was new and Clasp was called BRCL.
The underlying parser in clasp_parseNamestring doesn't seem to actually be limited to byte characters on first glance, so maybe this will be easy to fix.
Nope, not that simple. namestring
improperly treats components as base strings for reasons I don't follow, and the actual OS functions do not seem to work with your ring pathname. Let me try to sort all this out, I hate having secret English-centric assumptions littered around.
Let me try to sort all this out, I hate having secret English-centric assumptions littered around.
Perhaps an innocent "ASCII will do for now" kind of thought, the lingering effects of which we're seeing here. Thanks for tackling this.
I will test your fix locally as well.
Please do. I think I got it working, but I might have missed some of the more obscure filesystem access functions.
As far as my tests are concerned, your fixes work. Thank you!
Describe the bug
If we inspect the string
"/foo/bar/ゆびわ"
, we see:Wonderful. However, if we attempt to include a unicode character in a pathname, like
(parse-namestring "/foo/bar/ゆびわ")
, the debugger opens and we're told:Somewhat cryptic. However,
base-string
is a hint, and the Clasp docs also mention:So perhaps somewhere in the depths of parsing the path, characters are assumed to be non-Unicode and a conversion to
base-string
(probably an array ofbase-char
?) is attempted.Expected behavior It should be possible to contain Unicode characters with pathnames, as people in non-English-speaking countries often have Unicode characters in filepaths on their computers.
Actual behavior (shown above)
Note also that this occurs for
#p
literals as well (probably powered byparse-namestring
underneath).