Open andychu opened 1 year ago
So you can see the syntax errors from JSON and Ninja, but not from any others. Awk gives a warning.
Also they don't output the same strings -- sometimes it's []
, and sometimes it's \[]
Are the string literals in this language M-extensible?
We simply test them for syntax errors after a special char like \
This is also relevant to YSTR, where we add \xff and \u{012345} escapes
Traceback (most recent call last):
File "_tmp/foo.c", line 2, in <module>
json.loads('"\[]"')
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 2 (char 1)
[JSON] YES
---
ninja: error: _tmp/z.ninja:6: bad $-escape (literal $ must be written as $$)
build _tmp/$[ : copy _tmp/ninja-in
^ near here
[Ninja] YES
---
_tmp/z.mk:5: warning: overriding recipe for target '_tmp/make-out'
_tmp/z.mk:2: warning: ignoring old recipe for target '_tmp/make-out'
cp _tmp/make-in _tmp/make-out
[GNU Make] NO, expected syntax error
---
_tmp/foo.c: In function ‘main’:
_tmp/foo.c:4:10: warning: unknown escape sequence: '\m'
printf("\m[]\n");
^~~~~~~~
[C] NO, expected syntax error
---
Running C
[]
m[]
---
\[]
\m[]
[Python] NO, expected syntax error
---
\[]
\m[]
[Shell] NO, expected syntax error
awk: cmd. line:3: warning: escape sequence `\[' treated as plain `['
awk: cmd. line:4: warning: escape sequence `\m' treated as plain `m'
[]
m[]
[Awk] NO, expected syntax error
---
[]
m[]
[JavaScript] NO, expected syntax error
---
Here is a related data language I've been working on:
https://www.oilshell.org/release/latest/doc/qsn.html
The problems solved and relation to shell are laid out in the doc. It's basically cleaned up C string literals (based on Rust) that are more byte-string and utf-8 centric than JSON, which you need for Unix.
QSN is implemented in Oil now. But I started using it more, and what annoyed me is that it's not backward compatible with JSON.
JSON is a "narrow waist" with a lot of inertia.
So I'm working on a second iteration ("YSTR"), which is simple and small, but solves many problems. You could say the tagline is "one (cross-language) string literal syntax to rule them all"
Summary:
\xff
(for binary) and \u{012345}
(utf-8, no surrogate pairs as in JSON)\{} \[] \()
for code\m[]
y"foo\n"
\[]
or \m[]
directly)The justification for having matchertext in YSTR is basically as a "raw string", as you mention. It can prevent the "leaning toothpick" problem for:
"\\s+"
Also I'd say as an analogy to s-expressions, it can represent recursive structure with concatenation. If you have to add levels of \\
then you're not just concatenating !
In shell you would use 'single quoted'
strings to avoid \
, but they can't represent single quotes.
Shell has 7 or 8 types of string literal to get around that! I posted some comments on today's matchertext lobste.rs thread about what I was thinking (before I read the paper):
https://lobste.rs/s/9ttq0x/matchertext_escape_route_from_language
(I can also suggest some improvements in terminology / presentation if interested, since it appears many people misunderstood it -- I think it's a great idea, though as the paper mentions, there are problems to be ironed out)
Very interesting idea and great paper. I've been working on similar "data languages" as complements to https://www.oilshell.org/
I wrote a shell script that I think demonstrates a practical issue with Section 4.2 : C-like Host Languages. That is, basically no languages give syntax errors for the proposed
\[]
or\m[]
(blog post says\m[]
)So adding matchertext in the proposed way would technically be a breaking change. Some languages might have an evolution process for minor changes, but I highly doubt a language like JavaScript or C could do this.
Summary of results:
https://github.com/oilshell/oil/blob/master/demo/matchertext.sh
I'll paste the output of the script in the next comment