dashbitco / nimble_parsec

A simple and fast library for text-based parser combinators
805 stars 51 forks source link

Nested parsec() call seems to consume error #129

Closed dkulchenko closed 1 year ago

dkulchenko commented 1 year ago
defmodule ParserError do
  import NimbleParsec

  def fail(combinator, error) do
    post_traverse(
      combinator,
      {ParserError, :throw_error, [error]}
    )
  end

  def throw_error(_, _, _, _, _, error) do
    IO.puts("throwing error #{error}")
    {:error, error}
  end
end

defmodule Parser do
  import NimbleParsec

  defparsec(:contents, choice([
    string("123</abc>"),
    string("123")
    |> ParserError.fail("missing closing </abc> tag")
  ]))

  defparsec(:parse, choice([
    string("<abc>")
    |> parsec(:contents),
    string("<abc>")
    |> ParserError.fail("unknown syntax error inside <abc> tag")
  ]))
end

IO.inspect(Parser.parse("<abc>123"))

In this test case, I'm trying to use choice() as a way to do fallback error handling (basically, either match the happy path, or match only the beginning of the syntax and throw an error).

Running the script has the following output:

throwing error missing closing </abc> tag
throwing error unknown syntax error inside <abc> tag
{:error, "unknown syntax error inside <abc> tag", "123", %{}, {1, 0}, 5}

So the first error gets "swallowed". Is there anything that can be done to expose the "child" error?

(This is a super simplified scenario and in the actual project, there's a huge web of choice() calls, but this is the simplest synthetic test case that triggers the issue.)

Using function composition instead of parsec() works correctly:


defmodule Parser.Helper do
  import NimbleParsec

  def contents(combinator) do
    combinator
    |> choice([
      string("123</abc>"),
      string("123")
      |> ParserError.fail("missing closing </abc> tag")
    ])
  end
end

defmodule Parser do
  import NimbleParsec
  import Parser.Helper

  defparsec(:parse, choice([
    string("<abc>")
    |> contents(),
    string("<abc>")
    |> ParserError.fail("unknown syntax error inside <abc> tag")
  ]))
end

IO.inspect(Parser.parse("<abc>123"))

returns:

throwing error missing closing </abc> tag
{:error, "missing closing </abc> tag", "", %{}, {1, 0}, 8}

I'd love to avoid parsec() and just use direct function calls but unfortunately it's a complex self-referential parser and unwinding the circular dependencies would be incredibly difficult.

dkulchenko commented 1 year ago

Working around things with this:

defmodule ParserError do
  import NimbleParsec

  def fail(combinator, error) do
    post_traverse(combinator, {__MODULE__, :add_error, [error]})
  end

  def add_error(_rest, acc, context, line, offset, error) do
    annotated_error = %{message: error, line: line, offset: offset}
    {acc, Map.update(context, :errors, [annotated_error], &[annotated_error | &1])}
  end

  def throw_errors(combinator) do
    post_traverse(combinator, {__MODULE__, :throw_errors, []})
  end

  def throw_errors(_rest, _acc, %{errors: errors}, _line, _offset) when errors != [] do
    {:error, errors}
  end

  def throw_errors(_rest, acc, context, _line, _offset) do
    {acc, context}
  end
end

defmodule Parser do
  import ParserError
  import NimbleParsec

  defparsec(
    :contents,
    choice([
      string("123</abc>"),
      string("123") |> fail("missing closing </abc> tag")
    ])
  )

  defparsec(
    :parse,
    choice([
      string("<abc>") |> parsec(:contents),
      string("<abc>") |> fail("unknown syntax error inside <abc> tag")
    ])
    |> throw_errors()
    |> eos()
  )
end

IO.inspect(Parser.parse("<abc>123"))

which works great and gives the bonus of accumulating errors and providing context for each. Leaving this open in case this is a bug/unintended behavior.

josevalim commented 1 year ago

I am looking at the code and choices does not actually guarantee the order they are proceed or which one will be marked as failed in case all of them fail. I will document this behaviour but it may also be possible to write this with a different set of combinators.

dkulchenko commented 1 year ago

I am looking at the code and choices does not actually guarantee the order they are proceed or which one will be marked as failed in case all of them fail.

Are you sure? Playing with some scenarios with debug: true on, it looks like choice() always generates function clauses in an execution order exactly matching that in which they're provided, even for complex trees (which is inherently desirable for determinism/optimization, so I'm perfectly happy if this is unintentional!).

Across ~50 combinators in my parser that are relying on strict choice() ordering, every test that relies on this behavior succeeds consistently.

re: error handling though, makes sense - perfectly happy using context + a throw at the end for this purpose.