arakelian / java-jq

Lightweight Java wrapper around JQ, a flexible JSON processor available for multiple platforms
MIT License
82 stars 10 forks source link

what is the performance to be expected? #16

Closed alfonz19 closed 3 years ago

alfonz19 commented 3 years ago

Hi, I tried to use this in batch processing, but unless I'm missing something, the performance is rather worse than when running from cli.

so for example this:

echo 1 | /usr/bin/time -v jq '{"results": [{"itemValue": (.|tostring)}]}'

will require:

User time (seconds): 0.04
System time (seconds): 0.00

however in java, using code:

    Instant startreq = Instant.now();
    JqRequest request = ImmutableJqRequest.builder()
            .lib(LIBRARY)
            .pretty(true)
            .input(input)
            .filter(jqFilter)
            .build();
    Instant endreq = Instant.now();
    System.out.println("req: "+Duration.between(startreq, endreq).toNanos() / 1000 + "micro");

    Instant startExec = Instant.now();
    JqResponse response = request.execute();
    Instant endExec = Instant.now();

    System.out.println("exec: "+Duration.between(startExec, endExec).toMillis() + "ms");

the times just to construct request sorted from worst to better are (in microseconds):

8188
7685
4807
4451
4184
4059
3708
3515
121
86
84
61
59
57
54

minimum being 17 micro, which is OK, but the execution itself is, from worst to best in MILLISECONDS

590 585 579 557 556 555 555 555 549 546 546 537

which is worse then 10 times slower.

Am I doing something wrong with this library or are these the numbers to be expected?

arakelian commented 3 years ago

@alfonz19 I appreciate you taking the time to write this up. To be honest, I didn't do a whole lot of benchmarking when I wrote this wrapper. I eventually gave up on jq-wrapping for my own use case (transforming billions of JSON documents) when I saw this ticket https://github.com/stedolan/jq/issues/120. Even if I put in the work to optimize startup times, etc., it'll never perform better than a single thread use case -- you're basically forced to launch a new process each time, and that is a killer at the scale I was operating at.

arakelian commented 3 years ago

@alfonz19 Will happily take PRs to improve performance, but not something I plan to work on for reasons above. Thanks.