FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

`Smile` decoding issue with `NonBlockingByteArrayParser`, concurrency #384

Closed simondaudin closed 1 year ago

simondaudin commented 1 year ago

Hi there

I noticed an error when using SMILE NonBlockingByteArrayParser in an asynchronous environment (in an HTTP environment). I managed to reproduce the error in a local test and to propose a "simple" test that produce the error.

Jackson version: 2.15.2 (and before)

Here is the failing test

    @Test
    public void test_KO() throws IOException, ExecutionException, InterruptedException {
        Map<String, Map<String, String>> tags = new HashMap<>();
        for (int i = 0; i < 10; i++) {
            Map<String, String> value = new HashMap<>();
            for (int j = 0; j < 10; j++) {
                value.put("key_" + j, "val" + j);
            }
            tags.put("elt_" + i, value);
        }

        JsonFactory jsonFactory = new SmileFactory();
        ObjectMapper objectMapper = new ObjectMapper();
        ObjectWriter objectWriter = objectMapper.writer().with(jsonFactory);
        jsonFactory.setCodec(objectMapper);
        byte[] json = objectWriter.writeValueAsBytes(tags);
        TypeReference<Map<String, Map<String, String>>> typeReference = new TypeReference<Map<String, Map<String, String>>>() {
        };

        ExecutorService executorService = Executors.newFixedThreadPool(10);
        List<CompletableFuture<?>> futures = new ArrayList<>();

        int count = 100000;
        for (int i = 0; i < count; i++) {
            JsonParser parser = jsonFactory.createNonBlockingByteArrayParser();
            ByteArrayFeeder inputFeeder = (ByteArrayFeeder) parser.getNonBlockingInputFeeder();
            futures.add(CompletableFuture.supplyAsync(() -> {
                try {
                    inputFeeder.feedInput(json, 0, json.length);
                    TokenBuffer tokenBuffer = new TokenBuffer(parser);
                    while (true) {
                        JsonToken token = parser.nextToken();
                        if (token == JsonToken.NOT_AVAILABLE || token == null) {
                            break;
                        }

                        tokenBuffer.copyCurrentEvent(parser);
                    }
                    return tokenBuffer.asParser(jsonFactory.getCodec()).readValueAs(typeReference);
                } catch (IOException e) {
                    throw new RuntimeException(e);
                } finally {
                    try {
                        inputFeeder.endOfInput();
                        parser.close();
                    } catch (IOException e) {
                        throw new RuntimeException(e);
                    }
                }
            }, executorService));
        }

        CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])).get();
    }

Here is the error it produces

java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: java.util.LinkedHashMap["elt_2"])

    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
    at com.liveaction.reactiff.codec.jackson.JacksonCodecTest.test_KO(JacksonCodecTest.java:74)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
    at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
Caused by: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: java.util.LinkedHashMap["elt_2"])
    at com.liveaction.reactiff.codec.jackson.JacksonCodecTest.lambda$test_KO$0(JacksonCodecTest.java:62)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: java.util.LinkedHashMap["elt_2"])
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361)
    at com.fasterxml.jackson.databind.deser.std.ContainerDeserializerBase.wrapAndThrow(ContainerDeserializerBase.java:190)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:638)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:449)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
    at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
    at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4801)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2999)
    at com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:2396)
    at com.liveaction.reactiff.codec.jackson.JacksonCodecTest.lambda$test_KO$0(JacksonCodecTest.java:60)
    ... 4 more
Caused by: java.lang.NullPointerException
    at com.fasterxml.jackson.databind.util.TokenBuffer$Parser.nextFieldName(TokenBuffer.java:1677)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:608)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:449)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
    at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:623)
    ... 11 more

I noticed 3 things:

I didn't take the time to track deep in jackson, but it seemed that there was some trouble with the seenNames being empty when it should not have been. I don't know if maybe it should not be used this way or if it is a legit error.

Thanks

cowtowncoder commented 1 year ago

Thank you for reporting this, @simondaudin -- and especially for providing the reproduction!

Without looking too deep, based on failure and test code, it does seem like a legit bug somewhere and not incorrect usage. But I hope to add more information when I have time to look into this problem.

simondaudin commented 1 year ago

Hello, thanks for the response.

I actually got curious and went digging a little bit. It seems due to SmileBufferRecycler being created in the main thread and so, created only once in SmileParserBase ThreadLocal _smileRecyclerRef and then used inside all the parallel thread. Second problem, in SmileBufferRecycler there is a race condition between the get and the set to null.

I tried two solutions that both worked:

In the first solution, we only have one instance of SmileBufferRecycler shared by all the parallel threads. I had no further insights about what it was supposed to be in this context, so I was not sure about making a PR.

cowtowncoder commented 1 year ago

This may be due to earlier assumptions that parser instances were bound 1-to-1 on threads, and as such buffer recyclers would work without explicit synchronization as long as ThreadLocal was used. This turned out not to be the case with async processing, and changes have been made in jackson-core for recycling of symbol tables, for example.

But I suspect not for SmileBufferRecycler; it predates async Smile parser as well.

So this gives a reasonable idea that adding unfortunate-but-now-necessary locking is probably needed. src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java from jackson-core may provide some clues, I forget exact changes that have been made over various versions.

simondaudin commented 1 year ago

Thanks for the information ! It seems, indeed, to be a problematic similar to #479