GuntherRademacher / rr

RR - Railroad Diagram Generator
Apache License 2.0
467 stars 51 forks source link

No way for strings with both single/double quotes inside #23

Closed mingodad closed 1 year ago

mingodad commented 1 year ago

Looking the grammar for strings there is now escape sequence for strings that contains both single/double quotes but the W3C grammar does have it:

StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'" /* ws: explicit */
StringLiteral      ::=      ('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'")   /* ws: explicit */
PredefinedEntityRef    ::=      "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" /* ws: explicit */
EscapeQuot     ::=      '""'
EscapeApos     ::=      "''"
mingodad commented 1 year ago

Trying to rebuild the parser I noticed it doesn't build because of return new StructuredQName("p", "de/bottlecaps/railroad/core/Parser", functionName());}:

--- rr0/src/main/java/de/bottlecaps/railroad/core/Parser.java
+++ rr/parser/Parser.java
@@ -1,4 +1,4 @@
-// This file was generated on Tue Jan 31, 2023 11:09 (UTC+01) by REx v5.56 which is Copyright (c) 1979-2023 by Gunther Rademacher <grd@gmx.net>
+// This file was generated on Tue Jan 31, 2023 10:48 (UTC+01) by REx v5.56 which is Copyright (c) 1979-2023 by Gunther Rademacher <grd@gmx.net>
 // REx command line: Parser.ebnf -java -tree -saxon10 -name de.bottlecaps.railroad.core.Parser

 package de.bottlecaps.railroad.core;
@@ -307,7 +307,7 @@
     abstract Sequence execute(XPathContext context, String input) throws XPathException;

     @Override
-    public StructuredQName getFunctionQName() {return new StructuredQName("p", "de/bottlecaps/railroad/core/Parser", functionName());}
+    public StructuredQName getFunctionQName() {return new StructuredQName("p", "Parser", functionName());}
     @Override
     public SequenceType[] getArgumentTypes() {return new SequenceType[] {SequenceType.SINGLE_STRING};}
     @Override
mingodad commented 1 year ago

Doing a manual replacement of "de/bottlecaps/railroad/core/Parser" by "Parser" then it build.

GuntherRademacher commented 1 year ago

You are referring to the XQuery grammar. Yes, XQuery provides escaping for single and double quotes. But that is XQuery, not grammars.

The relevant W3C definition for EBNF is this:

"string"       matches the sequence of characters that appear inside the double quotes.

'string'       matches the sequence of characters that appear inside the single quotes.

RR's grammar syntax here matches that definition. It is also aligned with the syntax of REx. There are no plans to extend it.

If you really need to combine single and double quotes as content in a single terminal, please define a lexical rule for it,

QuotedApostrophe
      ::= '"' "'" '"'
       /* ws:explicit */

Also see #6.

mingodad commented 1 year ago

Thanks for reply ! I did reported incorrectly the issue with rebuilding the parser, it does build but then when trying to run I'm getting:

java -jar rr.war test-dq.ebnf > test-dq.ebnf.xhtml
Static error on line 43 column 39 of basic-interface.xq:
  XPST0017  Cannot find a 1-argument function named Q{Parser}parse-Grammar()
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at de.bottlecaps.fatjar.Loader.main(Loader.java:70)
Caused by: net.sf.saxon.s9api.SaxonApiException: Cannot find a 1-argument function named Q{Parser}parse-Grammar()
    at net.sf.saxon.s9api.XQueryCompiler.compile(XQueryCompiler.java:566)
    at de.bottlecaps.railroad.RailroadGenerator.generate(RailroadGenerator.java:120)
    at de.bottlecaps.railroad.Railroad.main(Railroad.java:239)
    ... 5 more
Caused by: net.sf.saxon.trans.XPathException: Cannot find a 1-argument function named Q{Parser}parse-Grammar()
    at net.sf.saxon.query.UnboundFunctionLibrary.bindUnboundFunctionReferences(UnboundFunctionLibrary.java:178)
    at net.sf.saxon.query.QueryModule.bindUnboundFunctionCalls(QueryModule.java:1178)
    at net.sf.saxon.expr.instruct.Executable.fixupQueryModules(Executable.java:437)
    at net.sf.saxon.query.XQueryParser.makeXQueryExpression(XQueryParser.java:177)
    at net.sf.saxon.query.StaticQueryContext.compileQuery(StaticQueryContext.java:568)
    at net.sf.saxon.s9api.XQueryCompiler.compile(XQueryCompiler.java:562)
    ... 7 more
mingodad commented 1 year ago

I'm trying to generate EBNF grammars from tree-sitter grammars and convert the patterns/regexp to strings for showing then on railroad diagrams and they can have single/double quotes inside then like:

/"(""|[^"])*"/
/([^\s\\.\"\(\)\{\}@\'\\_]|\\[^\sa-zA-Z]|_[^\s;\.\"\(\)\{\}@])[^\s;\.\"\(\)\{\}@]*/
/(([^\s;\.\"\(\)\{\}@\'\\_]|\\[^\sa-zA-Z]|_[^\s;\.\"\(\)\{\}@])[^\s;\.\"\(\)\{\}@]*\.)*([^\s;\.\"\(\)\{\}@\'\\_]|\\[^\sa-zA-Z]|_[^\s;\.\"\(\)\{\}@])[^\s;\.\"\(\)\{\}@]*/
/[^#'"<>{}\[\]()`$|&;\\\s]/
/"([^"\\]|\\.)*"|'([^'\\]|\\.)*'/
/['"]/
/[^;\\'"]/
/[^()#"\\']/
/\\(u\{[0-9A-Fa-f]{4,6}\}|[nrt\"'\\])/
/\\(u\{[^}]*\}|[^nrt\"'\\])/
/([^?# \n\s\f()\[\]'`,\\";]|\\.)([^# \n\s\f()\[\]'`,\\";]|\\.)*/
...
mingodad commented 1 year ago

I'm not sure if QuotedApostrophe ::= '"' "'" '"' /* ws:explicit */ it's enough, can you give a working rr/src/main/java/de/bottlecaps/railroad/core/Parser.ebnf with the changes to allow it ?

I've tried this changes and it build and run but doesn't show the expected output:

StringLiteral         ::= '"' ('""' | [^"#x9#xA#xD])* '"'
                        | "'" ("''" | [^'#x9#xA#xD])* "'"
GuntherRademacher commented 1 year ago

Thanks for reply ! I did reported incorrectly the issue with rebuilding the parser, it does build but then when trying to run I'm getting: ...

Thanks for letting me know. Now fixed with 517c1f934faa6a2d6bb93d9354f3ce1104d42ab5

mingodad commented 1 year ago

I'm testing with this modified grammar that only add MixedStringLiteral ::= '"a quoted ''string''"' /* ws: explicit */:

/* extracted from https://www.bottlecaps.de/rr/ui on Tue Jan 31, 2023, 10:03 (UTC+01)
 */

Grammar  ::= Production*
Production
         ::= NCName '::=' ( Choice | Link )
NCName   ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Choice   ::= SequenceOrDifference ( '|' SequenceOrDifference )*
SequenceOrDifference
         ::= (Item ( '-' Item | Item* ))?
Item     ::= Primary ( '?' | '*' | '+' )*
Primary  ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral
         ::= '"' [^"]* '"' | "'" [^']* "'"
          /* ws: explicit */
MixedStringLiteral
         ::= '"a quoted ''string''"'
          /* ws: explicit */
CharCode ::= '#x' [0-9a-fA-F]+
          /* ws: explicit */
CharClass
         ::= '[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
          /* ws: explicit */
Char     ::= [http://www.w3.org/TR/xml#NT-Char]
CharRange
         ::= Char '-' ( Char - ']' )
          /* ws: explicit */
CharCodeRange
         ::= CharCode '-' CharCode
          /* ws: explicit */
Link     ::= '[' URL ']'
URL      ::= [^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
          /* ws: explicit */
Whitespace
         ::= S | Comment
S        ::= #x9 | #xA | #xD | #x20
Comment  ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
          /* ws: explicit */

With the current rr it does show the MixedStringLiteral but with my modified parser grammar:

StringLiteral         ::= '"' ('""' | [^"#x9#xA#xD])* '"'
                        | "'" ("''" | [^'#x9#xA#xD])* "'"

The MixedStringLiteral nonterminal simply disappear from the ouptut without any error message.

GuntherRademacher commented 1 year ago

I'm not sure if QuotedApostrophe ::= '"' "'" '"' /* ws:explicit */ it's enough,

It is definitely enough, as far as the syntax is concerned. But the processing logic needs to be adapted as well, i.e. you need to at least unescape these quotes when getting the literal's net content, and also take care of the escaping, when serializing it back to its quoted representation.

GuntherRademacher commented 1 year ago

The MixedStringLiteral nonterminal simply disappear from the ouptut without any error message.

This is an effect of the Inline literals option.

mingodad commented 1 year ago

Thank you again for all your help ! But it's strange that you do not recognize the need to have a way to escape single/double quotes to allow have both of then inside strings (it's normally a common functionality in any programming language).