amnaredo / test

0 stars 0 forks source link

ujson.AsyncParser leaks memory #239

Open amnaredo opened 2 years ago

amnaredo commented 2 years ago

Something's up with AsyncParser's memory usage. The following OOMs and dies within 10s:

import ujson.AsyncParser
import upickle.core.NoOpVisitor

object AsyncParserApp {

  def main(args: Array[String]): Unit = {
    val parser = AsyncParser[Unit]()
    parser.absorb("""{"foo":[""", NoOpVisitor)
    val chunk = """1, "omg", true, """ * 1000
    while (true) {
      parser.absorb(chunk, NoOpVisitor)
    }
  }
}

Stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
    at ujson.AsyncParser.resizeIfNecessary(AsyncParser.scala:111)
    at ujson.AsyncParser.absorb(AsyncParser.scala:86)
    at ujson.AsyncParser.absorb(AsyncParser.scala:96)
    at AsyncParserApp$.main(AsyncParserApp.scala:10)

Looks like it's retaining all the input it has ever seen into the data: Array[Byte] field. Same behavior for all 3 values of AsyncParser.Mode.

Tested against 0.7.1. ID: 252 Original Author: htmldoug

amnaredo commented 2 years ago

I don't actually know how AsyncParser works; i Just copied it from Jawn. Unless someone wants to clean it up and take ownership I'll probably just delete it Original Author: lihaoyi

amnaredo commented 2 years ago

Thanks for the response and also for all your work in general.

A non-blocking parser + upickle's Visitors + akka-streams Source/Flow/Sink is a killer combo for O(1) memory non-blocking json transformation. I'd like for the capability to continue to exist, but I don't trust the correctness or performance of AsyncParser enough to use in prod.

Instead, I've been wrapping jackson's NonBlockingJsonParser which performs faster under benchmarks and seems solid (https://gist.github.com/htmldoug/7b84b2febe67056c00199cea069003e3). I took another look at AsyncParser only to avoid porting the jackson implementation through the breaking Visitor interface changes in 0.7.1. Good interface changes, but they kept me up until 3 am last night trying to understand/replicate the usage.

Would you be interested in replacing the jawn AsyncParser with the jackson parser/visitor integration as a ujson.jackson.jvm module? It'd be significantly simpler to maintain, although couldn't support scala.js. As a consumer, I'd love for it to exist in the same repo to keep up with future breaking API changes, particularly until the Visitor interfaces are stable as 1.x.y. Original Author: htmldoug

amnaredo commented 2 years ago

I'd like to keep most of uPickle's code cross-platform where possible, so for now just go ahead and use your jackson integration, and we can replace the AsyncParser with a better one when someone decides they're willing to put in the porting effort Original Author: lihaoyi

amnaredo commented 2 years ago

I still owe you a PR for this. I think AsyncParser is fixable. Original Author: htmldoug

amnaredo commented 2 years ago

Resolved by removing AsyncParser in https://github.com/lihaoyi/upickle/commit/06eaccd40bf45b3d65eddbab87dd8b7ceb354dfe. Sorry I never got around to the PR. Removal from the core makes sense to minimize.

For my needs, flowing data through OutputStream -> jackson 2.10 -> Visitor has been treating me well. It also provides JsonPointer debugging info, obviating https://github.com/lihaoyi/upickle/pull/286. Original Author: htmldoug

amnaredo commented 2 years ago

Sounds good!

On Tue, 31 Dec 2019 at 3:20 AM, Doug Roper notifications@github.com wrote:

Resolved by removing AsyncParser in 06eaccd https://github.com/lihaoyi/upickle/commit/06eaccd40bf45b3d65eddbab87dd8b7ceb354dfe. Sorry I never got around to the PR. Removal from the core makes sense to minimize.

For my needs, flowing data through OutputStream -> jackson 2.10 -> Visitor has been treating me well. It also provides debug JsonPointer info, obviating #286 https://github.com/lihaoyi/upickle/pull/286, too.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lihaoyi/upickle/issues/252?email_source=notifications&email_token=AAHEB7ECP7AM4XQPJEDTHXLQ3JCYJA5CNFSM4GG2GPFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH3ABGQ#issuecomment-569770138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHEB7BK72GAPRVY32FYQULQ3JCYJANCNFSM4GG2GPFA .

Original Author: lihaoyi