Closed winitzki closed 10 months ago
I tried memoization like this:
val cache_minus = mutable.Map[Int, P[Int]]()
val cache_plus = mutable.Map[Int, P[Int]]()
def x_minus[$](implicit p: P[$]): P[Int] = cache_minus.getOrElseUpdate(p.index, P(x_times ~ "-" ~ expr).map { case (x, y) => x - y })
def x_plus[$](implicit p: P[$]): P[Int] = cache_plus.getOrElseUpdate(p.index, P(x_times ~ ("+" ~ expr).rep).map { case (i, is) => i + is.sum })
// Other grammar rules remain unchanged.
This fails whenever the input contains parentheses:
parse("123*(1+1)", program(_))
//
java.lang.ClassCastException: class scala.runtime.BoxedUnit cannot be cast to class java.lang.Integer (scala.runtime.BoxedUnit is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
But I was unable to find the code that does this unboxing to Int
. Probably, this code is generated by a macro. In any case, I don't know how to fix this problem.
I'm struggling to fix an exponential slowness in
fastparse 3.0.2
(Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.I tried to add memoization to the parsers but this failed with strange error messages about unboxing
Unit
intoInteger
.A small working example of a grammar that shows exponential slowness:
The code works but parsing takes about 10 seconds. I found that parsing this expression with
n
parentheses takes about $2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parseexpr
after(
and there are two possibilities (minus
andplus
). Each possibility will need to be fully explored before returning to parse the next(
. To explore means again to parseexpr
recursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:It takes $2^n$ function calls to compute
fibonacci(n)
by this program.I tried adding cuts to the grammar at various places but the exponential slowness remains.
So, I have two questions:
1) Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?
2) Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?