Closed mmatera closed 2 years ago
Investigating this, this appears to be a Boxing issue, not a scanning or parsing issue. The scanner allows three kinds of input:
\[DiferentialD]
\u0001D451
\u7f4c
As a M-Expression this is properly turned into an Infix operator .
It is then format's job, neither the scanner's nor parser's job to take this correctly tagged node in conjunction with the current $CharacterEncoding
value and turn this into the right symbol. Possibly a function similar to FromCharacterCode[]
can be used here. The problem with FromCharacterCode[]
is that we need to convert a named character into the right code based on $CharacterEncoding
. We can add special ASCII operator to WMA or standard Unicode if need be. However we just need to find the right sequence in WMA speak to get this done.
After the last release, the behavior of MathicsScanner changed, in a way that named characters in strings are mapped to
unicode-equivalent
instead ofwl-code
as before. After fighting with the formatter code in mathics-core, I think this behavior is wrong. The reason is that the goal of havingunicode-equivalent
is to provide a readable output, not to have an efficient way to store characters.The example comes up with
"\[DifferentialD]"
. In 1.2.4, this string was parsed as"\u7f4c"
, which was a WL specific character, with a specific meaning. If the string has a form like"\[Integral]F[x]\[DifferentialD] x"
, the string can be parsed afterward as the expressionIntegrate[F[x], x]
. On the other hand, if we want to produce a printable version,\[DifferentialD]
could be converted intod
, or\u0001D451
or\, d
, according to the place we need it.With the current behavior in master, the
test/format/test_format.py
tests in mathics-core fails.