FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

SmileParser doesn't honor USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING #50

Closed brharrington closed 3 years ago

brharrington commented 7 years ago

I'm seeing some odd issues with the wrong strings getting used for some field names. The probability is small, but on a large cluster with many messages it happens frequently enough. I suspect it is some issue with the ThreadLocal buffers because the field names I'm seeing seem to be coming from different messages that are only really connected by the use of jackson and the workers happen to get processed using the same thread pool.

I wanted to disable the use of thread locals to test that theory, but it doesn't look like that is possible for the smile parser. Is it expected that all factories that support the thread local recycling would honor the current USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING flag?

cowtowncoder commented 7 years ago

@brharrington Yes I think it should apply to all parsers and generators. It is quite possible some others beyond smile codec are not honoring it; flag is newer addition than most of backends. I suspect you could also try JsonFactory.Feature.CANONICALIZE_FIELD_NAMES for isolating the issue.

Problem itself sounds worrying of course. Does this occur with a recent version? (like 2.8.7 or 2.7.9)

brharrington commented 7 years ago

I was running 2.8.4, but will try newer versions. That said, I think the root issue was a bug on our side where the parser could be created in one thread but then used in another. Only the creation would be in the separate thread the use after that was all in a single, but potentially different thread. I'm fixing that and will see if it clears up.

When debugging I had assumed the flag would apply to all backends, so was incorrectly ruling out some possibilities.

cowtowncoder commented 7 years ago

@brharrington That's new enough version so unlikely to be different. I just asked since I get many reports against old versions and it's easy check to make.

Thank you for reporting the lack of support: I'll get that fixed when I get a chance!

Bug itself sounds odd: while it is true that parser/generators are not designed thread-safe (wrt sharing), buffer sharing itself should not result in such problems. So there could still be some sort of flag. But I understand that won't be priority on your side if you can avoid it.

cowtowncoder commented 7 years ago

Ok. Yes, this setting is not used. Unfortunately it is less simple to fix this, because whereas JSON backend deals with this at JsonFactory, with smile access is directly by parser/generator classes, which in turn have no acecss to JsonFactory.Feature settings (as they typically would be of no use, presumably affecting factory, not parser/generator).

Will see what to do here.

brharrington commented 7 years ago

FWIW, the change I mentioned above does seem to have cleared up the issue. I played around with a it a bit and I can usually (~50% of the time) reproduce the symptoms with this test case:

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.dataformat.smile.SmileFactory;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

import java.io.ByteArrayOutputStream;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

@RunWith(JUnit4.class)
public class JacksonThreadLocal {

  static final Set<String> FIELDS_1 = new LinkedHashSet<>();
  static final Set<String> FIELDS_2 = new LinkedHashSet<>();
  static {
    for (int i = 0; i < 256; ++i) {
      if (Character.isLetterOrDigit(i)) {
        FIELDS_1.add("" + (char) i);
        FIELDS_2.add("" + i);
        FIELDS_1.add("" + (char) i + (char) i);
        FIELDS_2.add("" + i + ":" + i);
      }
      // Push FIELDS_1 over 256
      FIELDS_1.add("ABC");
      FIELDS_1.add("DEF");
      FIELDS_1.add("GHI");
    }
    System.out.println(FIELDS_1.size());
  }

  @Test
  public void fieldNameCorruption() throws Exception {
    ExecutorService pool = Executors.newFixedThreadPool(16);
    JsonFactory factory = new SmileFactory();
    byte[] obj1 = encode(FIELDS_1, factory);
    byte[] obj2 = encode(FIELDS_2, factory);
    for (int i = 0; i < 1000000; ++i) {
      final JsonParser parser = factory.createParser((i % 2 == 0) ? obj1 : obj2);
      final Set<String> expected = (i % 2 == 0) ? FIELDS_1 : FIELDS_2;
      pool.submit(() -> parseAndCheck(expected, parser));
    }
  }

  private byte[] encode(Set<String> fields, JsonFactory factory) throws Exception {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    JsonGenerator gen = factory.createGenerator(baos);
    gen.writeStartObject();
    for (String f : fields) {
      gen.writeNumberField(f, 1);
    }
    gen.writeEndObject();
    gen.close();
    baos.close();
    return baos.toByteArray();
  }

  private void parseAndCheck(Set<String> expected, JsonParser parser) {
    try {
      try {
        Set<String> fieldNames = new LinkedHashSet<>();
        JsonToken t = parser.nextToken();
        while (t != null) {
          if (t == JsonToken.FIELD_NAME) {
            fieldNames.add(parser.getText());
          }
          t = parser.nextToken();
        }
        if (!expected.equals(fieldNames)) {
          fieldNames.removeAll(expected);
          System.out.printf("Unexpected: %s%n", fieldNames);
        }
      } finally {
        parser.close();
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

I think this was also probably the cause of the other corruption issue I had asked about in the mailing list a while back. From my perspective the root problem can be considered a bug on my side. However, if you're saying it shouldn't be the case, then hopefully the test case above can help you reproduce.

cowtowncoder commented 7 years ago

@brharrington Thank you. I hope to have a look at this in couple of days (taking couple of days off now); test case very much welcome and anything that can repro it often enough, even if manually, is better than not able to reproduce.

cowtowncoder commented 3 years ago

Ok. Running the test now against 2.12.0-rc2 I cannot reproduce the issue. There have been a few fixes to concurrency handling of symbol/buffer reuse, and that may be part of the reason. But importantly wrt this issue, USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING should be honored by Smile backend now, as far as I can see -- it is passed via this in JsonFactory:

    public BufferRecycler _getBufferRecycler()

via

    protected IOContext _createContext(Object srcRef, boolean resourceManaged) {
        return new IOContext(_getBufferRecycler(), srcRef, resourceManaged);
    }

and is not overridden by SmileFactory as far as I can see.

Given above I think the issue has been resolved -- feel free to re-open or re-file if there are up-to-date problems with, say, 2.11.3 (or soon to be released 2.12.0).