dhall-lang / dhall-lang

Maintainable configuration files
https://dhall-lang.org
BSD 3-Clause "New" or "Revised" License
4.22k stars 173 forks source link

Trouble with the ABNF #395

Closed singpolyma closed 5 years ago

singpolyma commented 5 years ago

I am trying to generate a working Dhall parser from the ABNF as the first step to a new implementation. After not having much success, I decided to try something simpler just to get an understanding of the grammar and use the abnf cabal package as a test. After cabal install --constraint="megaparsec ==5.3.1" abnf here is the driver code:

import Data.String
import Text.ABNF as ABNF
import Data.Text.IO as T

main = do
  abnf <- T.readFile "dhall-lang/standard/dhall.abnf"
  input <- T.getContents
  let Right rules = ABNF.parseABNF "STDIN" abnf
  let Just rule = ABNF.canonicalizeRules (fromString "complete-expression") rules
  case ABNF.parseDocument rule input of
    Left _ -> print "fail"
    Right _ -> print "succeed"

If you save this and try it on all the tests:

for I in dhall-lang/tests/parser/success/*.dhall; do echo "$I"; runhaskell abnf.hs < "$I"; done

It fails on several cases, which my own attempts to implement have also failed on. I understand that the grammar requires backtracking and "use first working alternative" ordering, but it is my understanding that attoparsec (which is what the abnf package generates, and I have also tried generating an attoparsec parser from the grammar myself) has these features.

What am I missing?

Gabriella439 commented 5 years ago

@singpolyma: Can you give an example case that it fails on?

jmitchell commented 5 years ago

@singpolyma, I don't have specific advice (yet) about the parse errors you're getting using the abnf Hackage package, but you may glean some insights from the tree-sitter grammar I'm working on (see #323). It still fails on just a few tests, though so watch out for those.

singpolyma commented 5 years ago

@Gabriel439 tests from master that fail are:

dhall-lang/tests/parser/success/annotationsA.dhall
dhall-lang/tests/parser/success/builtinsA.dhall
dhall-lang/tests/parser/success/collectionImportTypeA.dhall
dhall-lang/tests/parser/success/largeExpressionA.dhall
dhall-lang/tests/parser/success/listA.dhall
dhall-lang/tests/parser/success/operatorsA.dhall
dhall-lang/tests/parser/success/pathsA.dhall
dhall-lang/tests/parser/success/pathTerminationA.dhall
dhall-lang/tests/parser/success/quotedPathsA.dhall
dhall-lang/tests/parser/success/unicodePathsA.dhall
dhall-lang/tests/parser/success/urlsA.dhall
Gabriella439 commented 5 years ago

@singpolyma: What parse error do you get?

Judging from the set of examples you gave, my initial guess is that several of those failures are due to overlap in the parsers for List literals and old-style Optional literals. When you parse the initial [ that they both share the parser might commit to the wrong one.

Note that we plan to eliminate support for old-style Optional literals (possibly in the next release), if that helps simplify things.

singpolyma commented 5 years ago

It's attoparsec, so the errors aren't great:

dhall-lang/tests/parser/success/annotationsA.dhall
Fail "{ foo = ([] : List Natural) # [1, 2, 3] # ([1, 2, 3] : List Natural)\n, bar = [] : Optional Natural\n, baz = [1] : Optional Natural\n} : { foo : List Natural, bar : Optional Natural, baz : Optional Natural }\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/asTextA.dhall
Fail "://example.com/foo as Text\n" [] "endOfInput"

dhall-lang/tests/parser/success/builtinsA.dhall
Fail "  \955 ( x\n    : { field0 : Bool\n      , field1 : Optional (Optional Bool)\n      , field2 : Natural\n      , field3 : Integer\n      , field4 : Double\n      , field5 : Text\n      , field6 : List (List Bool)\n      }\n    )\n\8594 { field00 = Natural/fold\n  , field01 = Natural/build\n  , field02 = Natural/isZero\n  , field03 = Natural/even\n  , field04 = Natural/odd\n  , field05 = Natural/toInteger\n  , field06 = Natural/show\n  , field07 = Integer/show\n  , field08 = Double/show\n  , field09 = List/build\n  , field10 = List/fold\n  , field11 = List/length\n  , field12 = List/head\n  , field13 = List/last\n  , field14 = List/indexed\n  , field15 = List/reverse\n  , field16 = Optional/fold\n  , field17 = Optional/build\n  , field18 = True\n  , field19 = False\n  }\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/collectionImportTypeA.dhall
Fail "{ example0 = [] : Optional ./type.dhall\n, example1 = [] : List ./type.dhall\n}\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/largeExpressionA.dhall
Fail "  \955 ( xs\n    : List\n      { cores             : Natural\n      , host              : Text\n      , key               : Text\n      , mandatoryFeatures : List Text\n      , platforms         :\n          List\n          < AArch64_Linux  : {}\n          | ARMv5tel_Linux : {}\n          | ARMv7l_Linux   : {}\n          | I686_Cygwin    : {}\n          | I686_Linux     : {}\n          | MIPS64el_Linux : {}\n          | PowerPC_Linux  : {}\n          | X86_64_Cygwin  : {}\n          | X86_64_Darwin  : {}\n          | X86_64_FreeBSD : {}\n          | X86_64_Linux   : {}\n          | X86_64_Solaris : {}\n          >\n      , speedFactor       : Natural\n      , supportedFeatures : List Text\n      , user              : Optional Text\n      }\n    )\n\8594 List/fold\n  { cores             : Natural\n  , host              : Text\n  , key               : Text\n  , mandatoryFeatures : List Text\n  , platforms         :\n      List\n      < AArch64_Linux  : {}\n      | ARMv5tel_Linux : {}\n      | ARMv7l_Linux   : {}\n      | I686_Cygwin    : {}\n      | I686_Linux     : {}\n      | MIPS64el_Linux : {}\n      | PowerPC_Linux  : {}\n      | X86_64_Cygwin  : {}\n      | X86_64_Darwin  : {}\n      | X86_64_FreeBSD : {}\n      | X86_64_Linux   : {}\n      | X86_64_Solaris : {}\n      >\n  , speedFactor       : Natural\n  , supportedFeatures : List Text\n  , user              : Optional Text\n  }\n  xs\n  Text\n  (   \955 ( x\n        : { cores             : Natural\n          , host              : Text\n          , key               : Text\n          , mandatoryFeatures : List Text\n          , platforms         :\n              List\n              < AArch64_Linux  : {}\n              | ARMv5tel_Linux : {}\n              | ARMv7l_Linux   : {}\n              | I686_Cygwin    : {}\n              | I686_Linux     : {}\n              | MIPS64el_Linux : {}\n              | PowerPC_Linux  : {}\n              | X86_64_Cygwin  : {}\n              | X86_64_Darwin  : {}\n              | X86_64_FreeBSD : {}\n              | X86_64_Linux   : {}\n              | X86_64_Solaris : {}\n              >\n          , speedFactor       : Natural\n          , supportedFeatures : List Text\n          , user              : Optional Text\n          }\n        )\n    \8594 \955(y : Text)\n    \8594     (     Optional/fold\n                Text\n                x.user\n                Text\n                (\955(user : Text) \8594 user ++ \"@\" ++ x.host ++ \"\")\n                x.host\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    < AArch64_Linux  : {}\n                    | ARMv5tel_Linux : {}\n                    | ARMv7l_Linux   : {}\n                    | I686_Cygwin    : {}\n                    | I686_Linux     : {}\n                    | MIPS64el_Linux : {}\n                    | PowerPC_Linux  : {}\n                    | X86_64_Cygwin  : {}\n                    | X86_64_Darwin  : {}\n                    | X86_64_FreeBSD : {}\n                    | X86_64_Linux   : {}\n                    | X86_64_Solaris : {}\n                    >\n                    x.platforms\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955 ( element\n                          : < AArch64_Linux  : {}\n                            | ARMv5tel_Linux : {}\n                            | ARMv7l_Linux   : {}\n                            | I686_Cygwin    : {}\n                            | I686_Linux     : {}\n                            | MIPS64el_Linux : {}\n                            | PowerPC_Linux  : {}\n                            | X86_64_Cygwin  : {}\n                            | X86_64_Darwin  : {}\n                            | X86_64_FreeBSD : {}\n                            | X86_64_Linux   : {}\n                            | X86_64_Solaris : {}\n                            >\n                          )\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                              \955(_ : {})\n                            \8594 < NonEmpty =\n                                  merge\n                                  { AArch64_Linux  = \955(_ : {}) \8594 \"aarch64-linux\"\n                                  , ARMv5tel_Linux =\n                                      \955(_ : {}) \8594 \"armv5tel-linux\"\n                                  , ARMv7l_Linux   = \955(_ : {}) \8594 \"armv7l-linux\"\n                                  , I686_Cygwin    = \955(_ : {}) \8594 \"i686-cygwin\"\n                                  , I686_Linux     = \955(_ : {}) \8594 \"i686-linux\"\n                                  , MIPS64el_Linux =\n                                      \955(_ : {}) \8594 \"mips64el-linux\"\n                                  , PowerPC_Linux  = \955(_ : {}) \8594 \"powerpc-linux\"\n                                  , X86_64_Cygwin  = \955(_ : {}) \8594 \"x86_64-cygwin\"\n                                  , X86_64_Darwin  = \955(_ : {}) \8594 \"x86_64-darwin\"\n                                  , X86_64_FreeBSD =\n                                      \955(_ : {}) \8594 \"x86_64-freebsd\"\n                                  , X86_64_Linux   = \955(_ : {}) \8594 \"x86_64-linux\"\n                                  , X86_64_Solaris =\n                                      \955(_ : {}) \8594 \"x86_64-solaris\"\n                                  }\n                                  element\n                              | Empty    : {}\n                              >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty =\n                                      ( merge\n                                        { AArch64_Linux  =\n                                            \955(_ : {}) \8594 \"aarch64-linux\"\n                                        , ARMv5tel_Linux =\n                                            \955(_ : {}) \8594 \"armv5tel-linux\"\n                                        , ARMv7l_Linux   =\n                                            \955(_ : {}) \8594 \"armv7l-linux\"\n                                        , I686_Cygwin    =\n                                            \955(_ : {}) \8594 \"i686-cygwin\"\n                                        , I686_Linux     =\n                                            \955(_ : {}) \8594 \"i686-linux\"\n                                        , MIPS64el_Linux =\n                                            \955(_ : {}) \8594 \"mips64el-linux\"\n                                        , PowerPC_Linux  =\n                                            \955(_ : {}) \8594 \"powerpc-linux\"\n                                        , X86_64_Cygwin  =\n                                            \955(_ : {}) \8594 \"x86_64-cygwin\"\n                                        , X86_64_Darwin  =\n                                            \955(_ : {}) \8594 \"x86_64-darwin\"\n                                        , X86_64_FreeBSD =\n                                            \955(_ : {}) \8594 \"x86_64-freebsd\"\n                                        , X86_64_Linux   =\n                                            \955(_ : {}) \8594 \"x86_64-linux\"\n                                        , X86_64_Solaris =\n                                            \955(_ : {}) \8594 \"x86_64-solaris\"\n                                        }\n                                        element\n                                      )\n                                  ++  \",\"\n                                  ++  result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \" \"\n            ++  x.key\n            ++  \" \"\n            ++  Integer/show (Natural/toInteger x.cores)\n            ++  \" \"\n            ++  Integer/show (Natural/toInteger x.speedFactor)\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    Text\n                    x.supportedFeatures\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955(element : Text)\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                            \955(_ : {}) \8594 < NonEmpty = element | Empty : {} >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty = element ++ \",\" ++ result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    Text\n                    x.mandatoryFeatures\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955(element : Text)\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                            \955(_ : {}) \8594 < NonEmpty = element | Empty : {} >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty = element ++ \",\" ++ result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \"\\n\"\n          )\n      ++  y\n  )\n  \"\"\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/listA.dhall
Fail "[ [+1, +2, +3]\n, [+1, +2, +3] : List Integer\n, [] : List Integer\n]\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/operatorsA.dhall
Fail "  { foo = (False && Natural/even (1 + 2 * 3)) || True == False != True }\n\8743 { bar = [ \"ABC\" ++ \"DEF\" ] # [ \"GHI\" ] } \11005 { baz = True }\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/parenthesizeUsingA.dhall
Fail "://raw.githubusercontent.com/dhall-lang/Prelude/c79c2bc3c46f129cc5b6d594ce298a381bcae92c/List/replicate using (./a.dhall sha256:16173e984d35ee3ffd8b6b79167df89480e67d1cd03ea5d0fc93689e4d928e61) sha256:b0e3ec1797b32c80c0bcb7e8254b08c7e9e35e75e6b410c7ac21477ab90167ad\n" [] "endOfInput"

dhall-lang/tests/parser/success/pathsA.dhall
Fail "[ /absolute/path\n, ./relative/path\n, ~/home/anchored/path\n, /ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude\n]\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/pathTerminationA.dhall
Fail "-- Verify that certain punctuation marks terminate paths correctly\n  \955(x : ./example)\n\8594 [./example, {bar = <baz = ./example>, qux = ./example}, ./example]\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/quotedPathsA.dhall
Fail "{ example0 = /\"foo\"/bar/\"baz qux\"\n, example1 = https://example.com/foo/\"bar?baz\"?qux\n}\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/unicodePathsA.dhall
Fail "./families/\"\31162.dhall\"\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/urlsA.dhall
Fail "[ http://example.com/someFile.dhall\n, https://john:doe@example.com:8080/foo/bar?qux=0#xyzzy\n, http://prelude.dhall-lang.org/package.dhall\n, https://ipfs.io/ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude\n, https://raw.githubusercontent.com/dhall-lang/dhall-haskell/18e4e9a18dc53271146df3ccf5b4177c3552236b/examples/True\n, https://127.0.0.1/index.dhall\n, https://[::]/index.dhall\n, https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]/tutorial.dhall\n\n  -- Yes, this is legal\n, https://-._~%2C!$&'()*+,;=:@-._~%2C!$&'()*+,;=:/foo?/-._~%2C!$&'()*+,;=:@/?#/-._~%2C!$&'()*+,;=:@/?\n]\n" ["Rule","Sum"] "Failed reading: empty"

dhall-lang/tests/parser/success/whitespaceBuffetA.dhall
Fail "2 -- mixed {- line endings -}\n" [] "endOfInput"

Modifying the grammar as I think you suggest:

-    / open-bracket (empty-collection / non-empty-optional)
+    / open-bracket empty-collection

     ; "x : t"
     / operator-expression (colon expression / "")

-empty-collection = close-bracket colon (List / Optional) import-expression
+empty-collection = close-bracket colon List import-expression

Includes all the same failures as the unmodified grammar, so I don't think that's related.

Gabriella439 commented 5 years ago

@singpolyma: One thing I noticed is that several of the parse errors you pasted have the same context, which is ["Rule","Sum"]. This suggests that the parser is trying to parse something that you've annotated with <?> "Sum" and is not backtracking correctly when that parse fails.

Do you have access to the intermediate Haskell code generated from the ABNF to see what parser that context corresponds to?

Also, I wouldn't trust attoparsec's claims of backtracking perfectly. I've reported one issue where attoparsec did not correctly backtrack here, for example:

https://github.com/bos/attoparsec/issues/76

More generally, I wouldn't recommend using attoparsec when megaparsec is an option. My experience is that megaparsec is a strict improvement over attoparsec in every way (same high-performance primitives, better backtracking, and better error messages). However, it sounds like you don't have a choice if you use the abnf package to generate the parser.

singpolyma commented 5 years ago

Since I'm just trying to debug, I've now switched to megaparsec. Very similar failures so far, which indicates that I'm definitely missing some (probably backtracking-related) case. My current version wraps every sequence in the parser in try -- I've attempted adding try in a few other places, but it never affects the test cases passing or not any other place I've tried.

I've pasted my generator (in ruby, sorry) and the generated megaparsec haskell code here: https://paste.sr.ht/%7Esingpolyma/6cf7a00f5b5bc240157ba3172cfe8c879fdfb45d

And here are the full test failures:

dhall-lang/tests/parser/success/annotationsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 40 (Just (Tokens ('#' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens (',' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('}' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| "")]) :| [], bundlePosState = PosState {pstateInput = "{ foo = ([] : List Natural) # [1, 2, 3] # ([1, 2, 3] : List Natural)\n, bar = [] : Optional Natural\n, baz = [1] : Optional Natural\n} : { foo : List Natural, bar : Optional Natural, baz : Optional Natural }\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/asTextA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 5 (Just (Tokens (':' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('@' :| ""),Tokens ('_' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| ""),EndOfInput]) :| [], bundlePosState = PosState {pstateInput = "https://example.com/foo as Text\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/builtinsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 56 (Just (Tokens ('(' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens (',' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('}' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| "")]) :| [], bundlePosState = PosState {pstateInput = "  \955 ( x\n    : { field0 : Bool\n      , field1 : Optional (Optional Bool)\n      , field2 : Natural\n      , field3 : Integer\n      , field4 : Double\n      , field5 : Text\n      , field6 : List (List Bool)\n      }\n    )\n\8594 { field00 = Natural/fold\n  , field01 = Natural/build\n  , field02 = Natural/isZero\n  , field03 = Natural/even\n  , field04 = Natural/odd\n  , field05 = Natural/toInteger\n  , field06 = Natural/show\n  , field07 = Integer/show\n  , field08 = Double/show\n  , field09 = List/build\n  , field10 = List/fold\n  , field11 = List/length\n  , field12 = List/head\n  , field13 = List/last\n  , field14 = List/indexed\n  , field15 = List/reverse\n  , field16 = Optional/fold\n  , field17 = Optional/build\n  , field18 = True\n  , field19 = False\n  }\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/collectionImportTypeA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 2 (Just (Tokens ('e' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('-' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('}' :| "")]) :| [], bundlePosState = PosState {pstateInput = "{ example0 = [] : Optional ./type.dhall\n, example1 = [] : List ./type.dhall\n}\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/largeExpressionA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 26 (Just (Tokens ('{' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens (')' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| "")]) :| [], bundlePosState = PosState {pstateInput = "  \955 ( xs\n    : List\n      { cores             : Natural\n      , host              : Text\n      , key               : Text\n      , mandatoryFeatures : List Text\n      , platforms         :\n          List\n          < AArch64_Linux  : {}\n          | ARMv5tel_Linux : {}\n          | ARMv7l_Linux   : {}\n          | I686_Cygwin    : {}\n          | I686_Linux     : {}\n          | MIPS64el_Linux : {}\n          | PowerPC_Linux  : {}\n          | X86_64_Cygwin  : {}\n          | X86_64_Darwin  : {}\n          | X86_64_FreeBSD : {}\n          | X86_64_Linux   : {}\n          | X86_64_Solaris : {}\n          >\n      , speedFactor       : Natural\n      , supportedFeatures : List Text\n      , user              : Optional Text\n      }\n    )\n\8594 List/fold\n  { cores             : Natural\n  , host              : Text\n  , key               : Text\n  , mandatoryFeatures : List Text\n  , platforms         :\n      List\n      < AArch64_Linux  : {}\n      | ARMv5tel_Linux : {}\n      | ARMv7l_Linux   : {}\n      | I686_Cygwin    : {}\n      | I686_Linux     : {}\n      | MIPS64el_Linux : {}\n      | PowerPC_Linux  : {}\n      | X86_64_Cygwin  : {}\n      | X86_64_Darwin  : {}\n      | X86_64_FreeBSD : {}\n      | X86_64_Linux   : {}\n      | X86_64_Solaris : {}\n      >\n  , speedFactor       : Natural\n  , supportedFeatures : List Text\n  , user              : Optional Text\n  }\n  xs\n  Text\n  (   \955 ( x\n        : { cores             : Natural\n          , host              : Text\n          , key               : Text\n          , mandatoryFeatures : List Text\n          , platforms         :\n              List\n              < AArch64_Linux  : {}\n              | ARMv5tel_Linux : {}\n              | ARMv7l_Linux   : {}\n              | I686_Cygwin    : {}\n              | I686_Linux     : {}\n              | MIPS64el_Linux : {}\n              | PowerPC_Linux  : {}\n              | X86_64_Cygwin  : {}\n              | X86_64_Darwin  : {}\n              | X86_64_FreeBSD : {}\n              | X86_64_Linux   : {}\n              | X86_64_Solaris : {}\n              >\n          , speedFactor       : Natural\n          , supportedFeatures : List Text\n          , user              : Optional Text\n          }\n        )\n    \8594 \955(y : Text)\n    \8594     (     Optional/fold\n                Text\n                x.user\n                Text\n                (\955(user : Text) \8594 user ++ \"@\" ++ x.host ++ \"\")\n                x.host\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    < AArch64_Linux  : {}\n                    | ARMv5tel_Linux : {}\n                    | ARMv7l_Linux   : {}\n                    | I686_Cygwin    : {}\n                    | I686_Linux     : {}\n                    | MIPS64el_Linux : {}\n                    | PowerPC_Linux  : {}\n                    | X86_64_Cygwin  : {}\n                    | X86_64_Darwin  : {}\n                    | X86_64_FreeBSD : {}\n                    | X86_64_Linux   : {}\n                    | X86_64_Solaris : {}\n                    >\n                    x.platforms\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955 ( element\n                          : < AArch64_Linux  : {}\n                            | ARMv5tel_Linux : {}\n                            | ARMv7l_Linux   : {}\n                            | I686_Cygwin    : {}\n                            | I686_Linux     : {}\n                            | MIPS64el_Linux : {}\n                            | PowerPC_Linux  : {}\n                            | X86_64_Cygwin  : {}\n                            | X86_64_Darwin  : {}\n                            | X86_64_FreeBSD : {}\n                            | X86_64_Linux   : {}\n                            | X86_64_Solaris : {}\n                            >\n                          )\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                              \955(_ : {})\n                            \8594 < NonEmpty =\n                                  merge\n                                  { AArch64_Linux  = \955(_ : {}) \8594 \"aarch64-linux\"\n                                  , ARMv5tel_Linux =\n                                      \955(_ : {}) \8594 \"armv5tel-linux\"\n                                  , ARMv7l_Linux   = \955(_ : {}) \8594 \"armv7l-linux\"\n                                  , I686_Cygwin    = \955(_ : {}) \8594 \"i686-cygwin\"\n                                  , I686_Linux     = \955(_ : {}) \8594 \"i686-linux\"\n                                  , MIPS64el_Linux =\n                                      \955(_ : {}) \8594 \"mips64el-linux\"\n                                  , PowerPC_Linux  = \955(_ : {}) \8594 \"powerpc-linux\"\n                                  , X86_64_Cygwin  = \955(_ : {}) \8594 \"x86_64-cygwin\"\n                                  , X86_64_Darwin  = \955(_ : {}) \8594 \"x86_64-darwin\"\n                                  , X86_64_FreeBSD =\n                                      \955(_ : {}) \8594 \"x86_64-freebsd\"\n                                  , X86_64_Linux   = \955(_ : {}) \8594 \"x86_64-linux\"\n                                  , X86_64_Solaris =\n                                      \955(_ : {}) \8594 \"x86_64-solaris\"\n                                  }\n                                  element\n                              | Empty    : {}\n                              >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty =\n                                      ( merge\n                                        { AArch64_Linux  =\n                                            \955(_ : {}) \8594 \"aarch64-linux\"\n                                        , ARMv5tel_Linux =\n                                            \955(_ : {}) \8594 \"armv5tel-linux\"\n                                        , ARMv7l_Linux   =\n                                            \955(_ : {}) \8594 \"armv7l-linux\"\n                                        , I686_Cygwin    =\n                                            \955(_ : {}) \8594 \"i686-cygwin\"\n                                        , I686_Linux     =\n                                            \955(_ : {}) \8594 \"i686-linux\"\n                                        , MIPS64el_Linux =\n                                            \955(_ : {}) \8594 \"mips64el-linux\"\n                                        , PowerPC_Linux  =\n                                            \955(_ : {}) \8594 \"powerpc-linux\"\n                                        , X86_64_Cygwin  =\n                                            \955(_ : {}) \8594 \"x86_64-cygwin\"\n                                        , X86_64_Darwin  =\n                                            \955(_ : {}) \8594 \"x86_64-darwin\"\n                                        , X86_64_FreeBSD =\n                                            \955(_ : {}) \8594 \"x86_64-freebsd\"\n                                        , X86_64_Linux   =\n                                            \955(_ : {}) \8594 \"x86_64-linux\"\n                                        , X86_64_Solaris =\n                                            \955(_ : {}) \8594 \"x86_64-solaris\"\n                                        }\n                                        element\n                                      )\n                                  ++  \",\"\n                                  ++  result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \" \"\n            ++  x.key\n            ++  \" \"\n            ++  Integer/show (Natural/toInteger x.cores)\n            ++  \" \"\n            ++  Integer/show (Natural/toInteger x.speedFactor)\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    Text\n                    x.supportedFeatures\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955(element : Text)\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                            \955(_ : {}) \8594 < NonEmpty = element | Empty : {} >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty = element ++ \",\" ++ result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \" \"\n            ++  ( merge\n                  { Empty    = \955(_ : {}) \8594 \"\"\n                  , NonEmpty = \955(result : Text) \8594 result\n                  }\n                  ( List/fold\n                    Text\n                    x.mandatoryFeatures\n                    < Empty : {} | NonEmpty : Text >\n                    (   \955(element : Text)\n                      \8594 \955(status : < Empty : {} | NonEmpty : Text >)\n                      \8594 merge\n                        { Empty    =\n                            \955(_ : {}) \8594 < NonEmpty = element | Empty : {} >\n                        , NonEmpty =\n                              \955(result : Text)\n                            \8594 < NonEmpty = element ++ \",\" ++ result\n                              | Empty    : {}\n                              >\n                        }\n                        status\n                        : < Empty : {} | NonEmpty : Text >\n                    )\n                    < Empty = {=} | NonEmpty : Text >\n                  )\n                  : Text\n                )\n            ++  \"\\n\"\n          )\n      ++  y\n  )\n  \"\"\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/listA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 37 (Just (Tokens ('I' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens (',' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('S' :| ""),Tokens (']' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| "")]) :| [], bundlePosState = PosState {pstateInput = "[ [+1, +2, +3]\n, [+1, +2, +3] : List Integer\n, [] : List Integer\n]\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/operatorsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 4 (Just (Tokens ('f' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('-' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('}' :| "")]) :| [], bundlePosState = PosState {pstateInput = "  { foo = (False && Natural/even (1 + 2 * 3)) || True == False != True }\n\8743 { bar = [ \"ABC\" ++ \"DEF\" ] # [ \"GHI\" ] } \11005 { baz = True }\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/parenthesizeUsingA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 5 (Just (Tokens (':' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('@' :| ""),Tokens ('_' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| ""),EndOfInput]) :| [], bundlePosState = PosState {pstateInput = "https://raw.githubusercontent.com/dhall-lang/Prelude/c79c2bc3c46f129cc5b6d594ce298a381bcae92c/List/replicate using (./a.dhall sha256:16173e984d35ee3ffd8b6b79167df89480e67d1cd03ea5d0fc93689e4d928e61) sha256:b0e3ec1797b32c80c0bcb7e8254b08c7e9e35e75e6b410c7ac21477ab90167ad\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/pathsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 16 (Just (Tokens ('\n' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('-' :| ""),Tokens ('/' :| ""),Tokens ('=' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('~' :| "")]) :| [], bundlePosState = PosState {pstateInput = "[ /absolute/path\n, ./relative/path\n, ~/home/anchored/path\n, /ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude\n]\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/pathTerminationA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 84 (Just (Tokens (')' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('-' :| ""),Tokens ('/' :| ""),Tokens ('=' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('~' :| "")]) :| [], bundlePosState = PosState {pstateInput = "-- Verify that certain punctuation marks terminate paths correctly\n  \955(x : ./example)\n\8594 [./example, {bar = <baz = ./example>, qux = ./example}, ./example]\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/quotedPathsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 2 (Just (Tokens ('e' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('-' :| ""),Tokens ('S' :| ""),Tokens ('{' :| ""),Tokens ('}' :| "")]) :| [], bundlePosState = PosState {pstateInput = "{ example0 = /\"foo\"/bar/\"baz qux\"\n, example1 = https://example.com/foo/\"bar?baz\"?qux\n}\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/unicodePathsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 20 (Just (Tokens ('\n' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('-' :| ""),Tokens ('/' :| ""),Tokens ('S' :| ""),Tokens ('{' :| "")]) :| [], bundlePosState = PosState {pstateInput = "./families/\"\31162.dhall\"\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/urlsA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 6 (Just (Tokens (':' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens (',' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('@' :| ""),Tokens ('S' :| ""),Tokens (']' :| ""),Tokens ('_' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| "")]) :| [], bundlePosState = PosState {pstateInput = "[ http://example.com/someFile.dhall\n, https://john:doe@example.com:8080/foo/bar?qux=0#xyzzy\n, http://prelude.dhall-lang.org/package.dhall\n, https://ipfs.io/ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude\n, https://raw.githubusercontent.com/dhall-lang/dhall-haskell/18e4e9a18dc53271146df3ccf5b4177c3552236b/examples/True\n, https://127.0.0.1/index.dhall\n, https://[::]/index.dhall\n, https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]/tutorial.dhall\n\n  -- Yes, this is legal\n, https://-._~%2C!$&'()*+,;=:@-._~%2C!$&'()*+,;=:/foo?/-._~%2C!$&'()*+,;=:@/?#/-._~%2C!$&'()*+,;=:@/?\n]\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
--
dhall-lang/tests/parser/success/whitespaceBuffetA.dhall
Left (ParseErrorBundle {bundleErrors = TrivialError 62 (Just (Tokens ('2' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| ""),EndOfInput]) :| [], bundlePosState = PosState {pstateInput = "    -- This\n\r\n     Natural/even {-\n\r\n{- file\n-}\r\n  has\n\r\n -}  2 -- mixed {- line endings -}\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})
Gabriella439 commented 5 years ago

@singpolyma: I think you will need to take one of those parse failures and minimize it to the smallest example code that still fails to parse. If you do that we can probably more easily narrow down the root problem.

singpolyma commented 5 years ago

Ok... at least one of the minimal examples is List Natural that code alone fails to parse but is accepted by dhall-haskell. Looking into the parser, it seems that:

_list = try (_list_raw >> _whitespace) >> pure ()

and:

_application_expression =
  try
    ((count' 0 1 _some) >>
     _import_expression >>
     (many
         (try
             (_whitespace_chunk >> _import_expression)))) >>
  pure ()

This seemed a likely culprit to me, since the whitespace at the end of _list if too greedy could consume the space that _application_expression requires. However (List) (Natural) also fails to parse, so that is perhaps a wrong guess.

Parse error:

Left (ParseErrorBundle {bundleErrors = TrivialError 5 (Just (Tokens ('N' :| ""))) (fromList [Tokens ('\t' :| ""),Tokens ('\n' :| ""),Tokens ('\r' :| ""),Tokens (' ' :| ""),Tokens ('!' :| ""),Tokens ('#' :| ""),Tokens ('&' :| ""),Tokens ('*' :| ""),Tokens ('+' :| ""),Tokens ('-' :| ""),Tokens ('.' :| ""),Tokens ('/' :| ""),Tokens (':' :| ""),Tokens ('=' :| ""),Tokens ('?' :| ""),Tokens ('{' :| ""),Tokens ('|' :| ""),Tokens ('\8743' :| ""),Tokens ('\10835' :| ""),Tokens ('\11005' :| ""),EndOfInput]) :| [], bundlePosState = PosState {pstateInput = "List Natural\n", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "STDIN", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})

Which AFAICT just says "parse failed at the first character, which was N"

philandstuff commented 5 years ago

I suspect that your hypothesis is right. In List Natural, the List expression consumes all the whitespace. In (List) (Natural), the (List) expression consumes all the whitespace. The ABNF grammar is designed such that any import-expression will consume all trailing whitespace, which in some parsers (including Go’s pigeon, a PEG parser) will cause a failure because it doesn’t backtrack on the trailing whitespace.

For what it’s worth, I had exactly this problem with my hand-built grammar based on the ABNF. To “fix” it I’ve been managing whitespace differently - in philandstuff/dhall-golang, expressions don’t consume trailing whitespace, and instead rules which consume expressions need to explicitly say where whitespace is. This means that my application-expression rule can handle all the whitespace itself rather than have the first import-expression consume all the whitespace so there’s none left for the application-expression rule.

Unfortunately this solution won’t help you here, without rewriting the ABNF 🙁

singpolyma commented 5 years ago

some parsers (including Go’s pigeon, a PEG parser) will cause a failure because it doesn’t backtrack on the trailing whitespace. ‎> To “fix” it I’ve been managing whitespace differently - in philandstuff/dhall-golang, expressions don’t consume trailing whitespace, and instead rules which consume expressions need to explicitly say where whitespace is.

Unfortunately this solution won’t help you here, without rewriting the ABNF

Given their prevalence, having a PEG-compatible ABNF (if possible) seems like a useful goal, no? If this whitespace issue is all that stands in the way, I may attempt that. I'll check the whitespace handling in your repo.

philandstuff commented 5 years ago

I think that there are advantages to structuring the ABNF as it is: having the convention that every expression consumes trailing whitespace is easier to think about than having every rule manage its own whitespace. I wouldn't want to advocate changing this unless we really understood the consequences and had explored all possible alternatives.

I have idly wondered if a PEG-compatible ABNF grammar might still be possible if we flipped it: if every expression consumed leading whitespace (not trailing), then higher-level rules like application-expression could impose extra whitespace restrictions in a PEG-compatible way.

The other thing to consider here is that the required whitespace in application-expression may not be necessary in PEG grammars. It's required to resolve ambiguity in a CFG, but because all operators in PEG grammars have a defined preference order, there is no ambiguity, and so for example ./ab will never parse as an application-expression (of ./a applied to b) in a PEG grammar even if you remove the required whitespace from the rule. So if the ABNF were PEG-focused and didn't need to support CFGs, we could fix this by just dropping the required whitespace. Although this would have side effects such as making (\(x : Natural)->x)3 parse validly, whereas in the current grammar I believe this should fail because there is no whitespace in the application. (Aside: this expression parses in dhall-haskell, but I think the current ABNF should reject it?)

philandstuff commented 5 years ago

Followup: this discussion helped me see that I could simplify my grammar a lot (see philandstuff/dhall-golang@2a7b3d3 for details).

Gabriella439 commented 5 years ago

@philandstuff @singpolyma: Yeah, I totally forgot that the Haskell implementation had the exact same issue where it will fail to parse expressions if you insert mandatory non-empty whitespace in between functions and their arguments. The Haskell implementation permits (\(x : Natural)->x)3 for the exact same reason: to work around the parse failure just like you had to.

I think we should probably prefer to treat the ABNF as PEG to simplify implementation since multiple people have stumbled across this now, although that may make @f-f's job harder because he currently uses instaparse which treats the grammar as a CFG

philandstuff commented 5 years ago

I suppose another argument that we should prefer PEG is that the ABNF has this comment:

; Third, if there are multiple valid parses then prefer the first parse
; according to the ordering of alternatives. That is, the order of evaluation
; of the alternatives is left-to-right.

.. which comes for free in a PEG but is a bit of a faff in a CFG

singpolyma commented 5 years ago

Are there things other than the left-to-right requirement that would make a PEG-compatible grammar not work for a CFG-style parser? If not, and since the grammar already has the left-to-right requirement, I think this makes the most sense. Almost every language has a PEG toolkit (and most of the in-progress implementations seem to be using one) so if the ABNF can be PEG-compatible while not being gratuitously incompatible with other parsing styles it should help a lot.

Gabriella439 commented 5 years ago

@singpolyma: That was actually the original intention. Although I didn't know the terms PEG/CFG at the time I first authored the grammar my intention was for it to be a PEG, but still supporting a CFG as much as possible. If you browse through various earlier discussions about the grammar you will see some discussions about taking pains to remove ambiguities in the grammar when it was feasible to do so.

That required space between a function and its argument was one such change introduced to remove ambiguity, in this pull request:

I attempted to reflect that change to the standard in the Haskell implementation here:

... but as you can see from that pull request's description, I ran into the exact same issue that both of you did:

Note that this doesn't include the whitespace-chunk rule added by the above change. The reason why is that the whitespaceChunk ends up failing due to the whitespace having already been consumed by a previous whitespace rule. I could make the whitespace rule backtrack at the cost of possibly performance degradation and worse error messages, but I chose not to because the Haskell implementation currently doesn't suffer from the ambiguous parses that the whitespace-chunk rule was originally intended to address.

So I made a choice to deviate from the standard for convenience, not realizing that others would probably run into the same problem. But based on this conversation it sounds like we should probably remove the required non-empty whitespace in the middle of function application since there is no way we can satisfy both PEGs and CFGs in this particular instance and we should err on the side of PEGs.

I will put up a pull request to remove that required space later tonight and then we can continue the discussion there.

Gabriella439 commented 5 years ago

Fix is up here: https://github.com/dhall-lang/dhall-lang/pull/422

singpolyma commented 5 years ago

Now that I found this first issue, it's easier to wrap my head around what could be going wrong in other places. A basically identical issue is everywhere that the grammar says

directory file

Since:

directory = *path-component
file = path-component

the directory parse always consumes every path component and then the file parse fails. One possible solution is to use:

directory = +path-component

and not use file at all, though I understand there are reasons it might be nice to have them separate in the grammar. With the whitespace change and this change I'm down to only 5 failing tests with my Ruby PEG parser, which is great progress :)

singpolyma commented 5 years ago

The last simple expression I cannot parse is the form:

let x = 1 in x
singpolyma commented 5 years ago

Changing:

(let label (colon expression)? equal expression)+ in expression

to

(let label (colon expression)? equal /1 /)+ in expression

"fixes" the parser for the given expression, but is obviously not a fix and only a debugging hack to find which part of the recurrence is problematic.

Gabriella439 commented 5 years ago

@singpolyma: Yeah, that is the exact same thing the Haskell implementation does. It parses a path as1*path-component and then splits that after-the-fact into the directory and file parts. See:

https://github.com/dhall-lang/dhall-haskell/blob/4b7bdd458cf15927521e70a9dbcd13ca7ae2dc83/dhall/src/Dhall/Parser/Token.hs#L357-L364

I'd probably keep the grammar as is for that but maybe add an implementation note explaining this strick. The reason why is that the standard semantics need to refer to the directory and file separately in judgments so they have to be explicitly labeled in the grammar

singpolyma commented 5 years ago

My issue with let seems to be that simple_label can match in -- adding a negative-lookahead (which is not something the ABNF can express, too PEG-specific) fixes most let related failures. Two failing tests left for me.

singpolyma commented 5 years ago

Last failure was similar: simple_label can match let which was greedily gobbling that up in a multilet. Adding that to my negative lookahead fixes the test case, though I'm not sure it's the right thing to do in general. Maybe I need a list of every keyword in that lookahead? Unsure.

singpolyma commented 5 years ago

For completeness, my current diff of manual parser edits vs what is autogenerated from the grammar in order to pass all tests:

--- lib/dhall2.citrus   2019-03-13 22:20:21.381640864 -0500
+++ lib/dhall.citrus    2019-03-13 22:18:45.384100374 -0500
@@ -39,7 +39,7 @@
    (digit) | (/[a-f]/i) | (digit) | (/[a-f]/i)
 end
 rule simple_label
-   ((alpha) | (/_/i)) (((alpha) | (digit) | (/[\u{2d}\/_]/i))*)
+   !/(in|let)\s+/ (((alpha) | (/_/i)) (((alpha) | (digit) | (/[\u{2d}\/_]/i))*))
 end
 rule quoted_label_char
    /[\u{20}-@\u{5b}-_a-~]/i
@@ -276,10 +276,9 @@
    (/\//i) (((path_character)+) | ((/"/i) ((quoted_path_character)+) (/"/i)))
 end
 rule directory
-   (path_component)*
+   (path_component)+
 end
 rule file
-   path_component
 end
 rule local_raw
    ((/\u{2e}/i) (/\u{2e}/i) (directory) (file)) | ((/\u{2e}/i) (directory) (file)) | ((/~/i) (directory) (file)) | ((directory) (file))
Gabriella439 commented 5 years ago

@singpolyma: See this comment for simple-label:

https://github.com/dhall-lang/dhall-lang/blob/9e0bc0f83097d6221e78571e74b9d67a166fa636/standard/dhall.abnf#L144-L156

You need to modify the simple-label rule to not match those reserved keywords

Gabriella439 commented 5 years ago

Is it alright if I leave it up to @singpolyma/@philandstuff and @f-f to work out a way to reconcile how to fix the grammar between PEGs and CFGs? I would like to spend some time focusing on standardizing several features in the queue

f-f commented 5 years ago

@Gabriel439 of course, no worries! 🙂

I think #423 could be a good start. To give some context, in my case instaparse has some PEG extensions (see here for details), so I wouldn't mind even adding things like negative lookaheads to the grammar here (which would help with the last @singpolyma's problem some comments ago) - since it looks like all of us are patching the grammar to be more PEG-like anyways

There are some downsides to "diverging" from ABNF though:

TL;DR: I'm ok with any approach as long as I can reasonably patch the grammar to work with Instaparse (as an aside: would it be useful to share my patches here?)

philandstuff commented 5 years ago

@f-f yes, please share your patches!

As far as I can see, the main goal for us should be ensuring that the grammars of dhall-haskell and dhall-clojure agree with each other. In order to do so it’d be good to chronicle all the ways that each implementation differs from the official ABNF and understand why in each case.

Whether the ABNF prefers a PEG or CFG is less important than whether the current working implementations and the ABNF fit each other.

singpolyma commented 5 years ago

There are some downsides to "diverging" from ABNF though

I definitely think keeping to standard ABNF is mostly the way to go -- it can express everything we need except the reserved words.

That said, the grammar already requires PEG-like backtracking and left-to-right features to parse, so if we can make at least a PEG grammar machine-generatable from the ABNF‎ (except for reserved words) would be a big win.

Either version of the whitespace change‎ gets us closer to this, and the only other thing is the directory/file issue. I think that instead of keeping directory/file and a comment about removing file to make it work -- instead we make the grammar say +dir and not say file and make the comment say "the last path-component matched by this rule is referred to as "file" in the semantics). The grammar is machine-readable, the comments are for humans so the tie-in to semantics that humans read seems like the better candidate for a comment to me.

That would leave us with only the reserved words issue. I guess the "normal" way a CFG handles this is that it doesn't, and the lexer handles it. The comment could be expanded with an example negative-lookahead regex that PEGs could share to do this, not sure the best move there.

philandstuff commented 5 years ago

I raised #430 as (hopefully) a simple fix to these problems. Those of you with ABNF parsers, I'd appreciate you testing the changes out, as I do not have one, and my own dhall implementation (philandstuff/dhall-golang) only implements a subset of the grammar so far.

It's a relatively noninvasive change to the ABNF, especially compared with #423, and it allows the grammar to work with PEGs without breaking CFG compatibility.

f-f commented 5 years ago

I just tried #423 in my parser (diff here), and while it looks invasive at first I'd say it requires less changes than #430 (plus the concerns for much backtracking with leading whitespace). So I think it would be a good solution.

@singpolyma

I think that instead of keeping directory/file and a comment about removing file to make it work -- instead we make the grammar say +dir and not say file and make the comment say "the last path-component matched by this rule is referred to as "file" in the semantics). The grammar is machine-readable, the comments are for humans so the tie-in to semantics that humans read seems like the better candidate for a comment to me.

I'd agree with having +dir and specifying the reference in a comment

That would leave us with only the reserved words issue. I guess the "normal" way a CFG handles this is that it doesn't, and the lexer handles it. The comment could be expanded with an example negative-lookahead regex that PEGs could share to do this, not sure the best move there.

I solve this in my parser by adding some lookaheads to some rules, I guess we could add the following (or a similar) implementation to a comment in the grammar Here's the diff with the current grammar:

--- original.abnf   2019-03-16 17:56:56.000000000 +0200
+++ patched.abnf    2019-03-16 17:57:02.000000000 +0200
@@ -139,7 +139,7 @@
 ; ASCII digit
 DIGIT = %x30-39  ; 0-9

-HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
+HEXDIG = "a" /  "b" / "c" / "d" / "e" / "f" / DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

 ; A simple label cannot be one of the following reserved keywords:
 ;
@@ -154,7 +154,7 @@
 ; * missing
 ; * Infinity
 ; * Some
-simple-label = (ALPHA / "_") *(ALPHA / DIGIT / "-" / "/" / "_")
+simple-label = !builtin ((ALPHA / "_") *(ALPHA / DIGIT / "-" / "/" / "_"))

 quoted-label-char =
       %x20-5F
@@ -653,7 +653,7 @@

     ; "x"
     ; "x@2"
-    / identifier-raw
+    / keyword / identifier-raw

     ; "( e )"
     / open-parens expression close-parens-raw
@@ -687,3 +687,85 @@
 ; whitespace prefix for the top-level of the program
 complete-expression = whitespace expression

+
+builtin = "Bool" nonempty-whitespace
+        / "Optional" nonempty-whitespace
+        / "None" nonempty-whitespace
+        / "Natural" nonempty-whitespace
+        / "Integer" nonempty-whitespace
+        / "Double" nonempty-whitespace
+        / "Text" nonempty-whitespace
+        / "True" nonempty-whitespace
+        / "False" nonempty-whitespace
+        / "NaN" nonempty-whitespace
+        / "Infinity" nonempty-whitespace
+        / "Type" nonempty-whitespace
+        / "Kind" nonempty-whitespace
+        / "Sort" nonempty-whitespace
+        / "List" nonempty-whitespace
+        / "Natural/fold" nonempty-whitespace
+        / "Natural/build" nonempty-whitespace
+        / "Natural/isZero" nonempty-whitespace
+        / "Natural/even" nonempty-whitespace
+        / "Natural/odd" nonempty-whitespace
+        / "Natural/toInteger" nonempty-whitespace
+        / "Natural/show" nonempty-whitespace
+        / "Integer/toDouble" nonempty-whitespace
+        / "Integer/show" nonempty-whitespace
+        / "Double/show" nonempty-whitespace
+        / "List/build" nonempty-whitespace
+        / "List/fold" nonempty-whitespace
+        / "List/length" nonempty-whitespace
+        / "List/head" nonempty-whitespace
+        / "List/last" nonempty-whitespace
+        / "List/indexed" nonempty-whitespace
+        / "List/reverse" nonempty-whitespace
+        / "Optional/fold" nonempty-whitespace
+        / "Optional/build" nonempty-whitespace
+        / "Text/show" nonempty-whitespace
+        / "if" nonempty-whitespace
+        / "then" nonempty-whitespace
+        / "else" nonempty-whitespace
+        / "let" nonempty-whitespace
+        / "in" nonempty-whitespace
+        / "as" nonempty-whitespace
+        / "using" nonempty-whitespace
+        / "merge" nonempty-whitespace
+        / "missing" nonempty-whitespace
+        / "Some" nonempty-whitespace
+
+keyword = "Bool" whitespace
+        / "Optional" whitespace
+        / "None" whitespace
+        / "Natural" whitespace
+        / "Integer" whitespace
+        / "Double" whitespace
+        / "Text" whitespace
+        / "True" whitespace
+        / "False" whitespace
+        / "NaN" whitespace
+        / "Infinity" whitespace
+        / "Type" whitespace
+        / "Kind" whitespace
+        / "Sort" whitespace
+        / "List" whitespace
+        / "Natural/fold" whitespace
+        / "Natural/build" whitespace
+        / "Natural/isZero" whitespace
+        / "Natural/even" whitespace
+        / "Natural/odd" whitespace
+        / "Natural/toInteger" whitespace
+        / "Natural/show" whitespace
+        / "Integer/toDouble" whitespace
+        / "Integer/show" whitespace
+        / "Double/show" whitespace
+        / "List/build" whitespace
+        / "List/fold" whitespace
+        / "List/length" whitespace
+        / "List/head" whitespace
+        / "List/last" whitespace
+        / "List/indexed" whitespace
+        / "List/reverse" whitespace
+        / "Optional/fold" whitespace
+        / "Optional/build" whitespace
+        / "Text/show" whitespace
\ No newline at end of file

Note: instaparse's abnf interpretation is case sensitive, so this is the reason why:

philandstuff commented 5 years ago

Looks like a consensus for #423 and against #430 is forming, particularly for performance reasons.

+builtin = "Bool" nonempty-whitespace
+        / "Optional" nonempty-whitespace

I take it that the nonempty-whitespace is here so that your negative lookahead doesn't cause Boolean to fail to parse as a simple-label?

However, in an expression like \(x : Bool) -> x, the Bool part could still match the simple-label rule because it doesn't match "Bool" nonempty-whitespace. I think this could be fixed with something like:

+builtin = "Bool" !(ALPHA / DIGIT / "-" / "/" / "_")
+        / "Optional" !(ALPHA / DIGIT / "-" / "/" / "_")

..where obviously it makes sense to extract (ALPHA / DIGIT / "-" / "/" / "_") into a rule of its own.

Nadrieril commented 5 years ago

That's why I advocated for moving reserved names outside the grammar (and possibly allowing them outside of bound variables). If we do that then

simple-label = simple-label-raw whitespace
simple-label-raw = keyword-raw +(ALPHA / DIGIT / "-" / "/" / "_") / !keyword-raw ((ALPHA / "_") *(ALPHA / DIGIT / "-" / "/" / "_"))
keyword-raw = "let" / "in" / "if" / "then" / "else" / "Infinity" / "NaN"

seems to be enough; that's what I have in my parser (combined with #423).

Gabriella439 commented 5 years ago

Just a few comments:

Nadrieril commented 5 years ago

I'd be happy to modify #423 like you suggest, but I currently have lots of changes to the grammar in my fork that would take some time to clean up

Nadrieril commented 5 years ago

I implemented @Gabriel439's suggestion, and included the various other things that have been mentioned in this thread. See here: https://github.com/dhall-lang/dhall-lang/pull/442

Nadrieril commented 5 years ago

https://github.com/dhall-lang/dhall-lang/pull/442 has been merged ! I believe we now have proper support for PEGs