Engelberg / instaparse

Eclipse Public License 1.0
2.74k stars 149 forks source link

Feature request: optional callback #196

Open skelter opened 4 years ago

skelter commented 4 years ago

It would be useful if the parser and/or Instaparse routines took an optional callback function to allow graceful halting of a parse.

While developing a grammar and parser and even when that parser is in production, I've encountered a situation in which a particular input inspires the parser to run uninterruptibly until exhausting heap. No doubt there are improvements needed in the grammar however debugging and dealing with the run-away parse is problematic and that it is possible causes some angst when we want to parse data in a production server with Instaparse. We've tried wrapping the parser in a future and interrupting it, and this approach is fraught with the usual interrupted thread issues.

It would be useful if the parser and/or Instaparse routines took an optional callback function to allow graceful halting of a parse. This function, if set, could be called periodically during the parse to confirm if parsing should carry on. This would help us improve resilience to the occasional run-away parse process.

Engelberg commented 4 years ago

I think you're right that a callback would be useful here, but I'd have to create an API for setting the callback, think about what kind of data should be passed to the callback so it can make meaningful decisions, and analyze the performance impact of doing something every time through the loop, etc., so it would be a bit of a development process.

I'd like to help you find a quick and easy solution for your current issue, though.

My understanding is that interrupting a thread or, relatedly, canceling a future, basically just sets a flag on the thread letting it know that someone wants to interrupt it, but it only actually does anything if the underlying code periodically checks for interruption. This happens automatically when a thread sleeps, or one can explicitly check for interruption.

So, I think one thing you could quickly explore, to see if it works for you, is to make a clone of the instaparse repository, and insert into the gll.cljc step function something like:

(when (Thread/interrupted)
  (throw (ex-info "Thread interrupted" {:interrupt true}))

That's also the spot where you could experiment with inserting a callback, if you prefer that to canceling from the outside.

If doing this every step degrades performance, another option would be for the run function to track how many steps have been called, and only do the sleep/callback every 100 steps or so.

Finally, another way to handle the whole thing would be for instaparse itself to have various timeout criteria that can be set, and have all those be checked in run or step. But that would be more complex to code.

But the above suggestion is something easy to try that should get your future canceling strategy working.

If you try this, please let me know how it works out, so I can consider this for general inclusion.