FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
314 stars 136 forks source link

Unable to set a compression input/output decorator to a `SmileFactory` #153

Closed guidomedina closed 5 years ago

guidomedina commented 5 years ago

I have a special need for the riak-java-client which only allows me to use an ObjectMapper to serialize/deserialize key-values, I would like to decorate a SmileFactory with compressors like LZ4, Snappy or GZip but at the moment this is not possible, when I try a mapper like the following:

public static final Charset UTF8=Charset.forName("UTF-8");

public static final ObjectMapper GZIP_JSON_MAPPER=new ObjectMapper(new SmileFactory().disable(ENCODE_BINARY_AS_7BIT)
   .setInputDecorator(new InputDecorator()
   {
     @Override
     public InputStream decorate(IOContext context,InputStream inputStream) throws IOException
     {
       return new GZIPInputStream(inputStream);
     }

     @Override
     public InputStream decorate(IOContext context,byte[] bytes,int offset,int length) throws IOException
     {
       return new GZIPInputStream(new ByteArrayInputStream(bytes,offset,length));
     }

     @Override
     public Reader decorate(IOContext context,Reader reader) throws IOException
     {
       return new InputStreamReader(new GZIPInputStream(new ReaderInputStream(reader)),UTF8);
     }
   })
   .setOutputDecorator(new OutputDecorator()
   {
     @Override
     public OutputStream decorate(IOContext context,OutputStream outputStream) throws IOException
     {
       return new GZIPOutputStream(outputStream);
     }

     @Override
     public Writer decorate(IOContext context,Writer writer) throws IOException
     {
       return new OutputStreamWriter(new GZIPOutputStream(new WriterOutputStream(writer,UTF8)));
     }
   }))
   .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
   .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
   .setSerializationInclusion(JsonInclude.Include.NON_NULL)
   .disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);

This is the exception I get:

Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
    at ...JsonUtils$4.decorate(JsonUtils.java:162)
    at com.fasterxml.jackson.core.JsonFactory._decorate(JsonFactory.java:1459)
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:330)
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:320)
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:29)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3091)

I used Gzip as an example, in reality I'm using both LZ4 and Gzip and both throw exceptions when I try with a SmileFactory, this works perfectly with a JsonFactory, the reason for me to prefer a SmileFactory over a JsonFactory is because it is notice-able faster than the JsonFactory so basically it'll help compensate the price I pay for compression.

cowtowncoder commented 5 years ago

Thank you for reporting this. I hope to look into this soon -- decoration may not be properly tested for all codec factories, but obviously should work.

cowtowncoder commented 5 years ago

Ah-ha. Looks like there's "double decoration" for particular parser factory method(s). I'll try to create good regression test here.

cowtowncoder commented 5 years ago

@guidomedina I found the problem, and it affects createParser() methods that take byte[]: I can fix it for 2.9.8. In the meantime you may want to explicitly construct ByteArrayInputStream, to work around the problem (that factory method does not "double-decorate" as far as I can see).

guidomedina commented 5 years ago

Hi @cowtowncoder thanks for the fix, I'll wait for 2.9.8 because I also added converters from one or more mappers to another on the fly for our data so the migration happens seamlessly.

We basically use a mapper based on the content-type to read and write with a target mapper and content-type, for example here is a list of few content-type we are using:

With this fix now we will be able to add:

I know this has nothing to do with the issue; I'm just trying to give you another scenario of how your awesome APIs are put to use, keep up the great work you do ;-)

guidomedina commented 5 years ago

@cowtowncoder do you have any ETA for releasing 2.9.8? I have to do some big data migration and I was hoping I could do it during the next weekend, I'm postponing it in order to get the SmileFactory advantage, again; many thanks for your support resolving this one.

Or maybe in the meantime I use the work around but didn't understand from your previous comment how to do it exactly.

cowtowncoder commented 5 years ago

Hi there! I am quite close to having 2.9.8, but waiting for some other work (related to Java 9+ forward compatibility), but next week may be a stretch from my perspective. Most likely mid-December, before christmas break.

Exciting to hear about usage: I am glad you found this feature useful -- it's one of those things that is bit under-utilized, and this is also why there was that bug (not enough usage to weed out). Funny part is that 3.0 branch (master) had a fix, due to sharing more code across stream factories.

As to workaround, I guess I now realized that if you do not control which createParser / createGenerator call is used by 3rd party, you can't use work-around. You could possibly sub-class SmileFactory and override the method but... that may be tricky, and is not very maintainable.

One other thing is that depending on how you release/deploy, you could perhaps do local build of jackson-dataformat-smile. But that also may not be an option.

guidomedina commented 5 years ago

This is the result with my newly build 2.9.7.5 from 2.9 branch, see how close LZ4 Smile is to Smile time wise and still faster than plain Json, that's what I was talking about:

Json size: 5509kb
Json time: 57.203ms

Smile size: 2120kb
Smile time: 42.102ms

Compressed smile size: 1862kb
Compressed smile time: 39.552ms

LZ4 Json size: 179kb
LZ4 Json time: 64.011ms

Gzip Json size: 431kb
Gzip Json time: 110.644ms

LZ4 Smile size: 384kb
LZ4 Smile time: 46.552ms

Gzip Smile size: 439kb
Gzip Smile time: 84.553ms
howardem commented 4 years ago

Hi Guido,

I hope you are doing well through this COVID-19 time we are living. We have custom Spring Cloud Stream message converters to handle content negotiation based on the headers of the incoming RabbitMQ messages. LZ4 is one of the compressors we want to use by decorating the ObjectMapper (same thing you did), but couldn't get closer to the numbers you posted. We try 2 different implementations of the algorithm:

We are using Jackson v2.11.0.

If you don't mind could you share which LZ4 implementation you used?

Best regards,

guidomedina commented 4 years ago

I'm using LZ4 double compression with 32KB block size, also because it is binary and Smile is faster than standard Json when I use some binary compression I'm using Smile straight forward, I'm using:

Here is the code for LZ4 Json mapper and LZ4 Smile mapper:

import static com.fasterxml.jackson.dataformat.smile.SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT;
import static java.nio.charset.StandardCharsets.UTF_8;
// there are more imports but I'm sure you will be able to figure them out

...
...

  public static final ParameterNamesModule PARAMETER_NAMES_MODULE = new ParameterNamesModule();
  public static final JavaTimeModule JAVA_TIME_MODULE = new JavaTimeModule();
  public static final Jdk8Module JDK_8_MODULE = new Jdk8Module();

  // Just to configure some modules
  public static ObjectMapper configureDefaultObjectMapper(ObjectMapper objectMapper)
  {
    return objectMapper
      .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
      .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
      .setSerializationInclusion(JsonInclude.Include.NON_NULL)
      .disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
      .disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE)
      .registerModule(PARAMETER_NAMES_MODULE)
      .registerModule(JAVA_TIME_MODULE)
      .registerModule(JDK_8_MODULE);
  }

  public static final int LZ4_BLOCK_SIZE = 32 * 1024;

  public static final ObjectMapper LZ4_JSON_MAPPER = configureDefaultObjectMapperForRiak(JsonMapper.builder(new JsonFactoryBuilder()
    .inputDecorator(new InputDecorator()
    {
      @Override
      public InputStream decorate(IOContext context, InputStream inputStream)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(inputStream));
      }

      @Override
      public InputStream decorate(IOContext context, byte[] bytes, int offset, int length)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(new ByteArrayInputStream(bytes, offset, length)));
      }

      @Override
      public Reader decorate(IOContext context, Reader reader)
      {
        return new InputStreamReader(new LZ4BlockInputStream(new LZ4BlockInputStream(new ReaderInputStream(reader, UTF_8))), UTF_8);
      }
    })
    .outputDecorator(new OutputDecorator()
    {
      @Override
      public OutputStream decorate(IOContext context, OutputStream outputStream)
      {
        return new LZ4BlockOutputStream(new LZ4BlockOutputStream(outputStream,
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor());
      }

      @Override
      public Writer decorate(IOContext context, Writer writer)
      {
        return new OutputStreamWriter(new LZ4BlockOutputStream(new LZ4BlockOutputStream(new WriterOutputStream(writer, UTF_8),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor())
        );
      }
    }).build()).build());

  public static final ObjectMapper LZ4_SMILE_MAPPER = configureDefaultObjectMapper(JsonMapper.builder(SmileFactory.builder()
    .disable(ENCODE_BINARY_AS_7BIT)
    .inputDecorator(new InputDecorator()
    {
      @Override
      public InputStream decorate(IOContext context, InputStream inputStream)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(inputStream));
      }

      @Override
      public InputStream decorate(IOContext context, byte[] bytes, int offset, int length)
      {
        return new LZ4BlockInputStream(new LZ4BlockInputStream(new ByteArrayInputStream(bytes, offset, length)));
      }

      @Override
      public Reader decorate(IOContext context, Reader reader)
      {
        return new InputStreamReader(new LZ4BlockInputStream(new LZ4BlockInputStream(new ReaderInputStream(reader, UTF_8))), UTF_8);
      }
    })
    .outputDecorator(new OutputDecorator()
    {
      @Override
      public OutputStream decorate(IOContext context, OutputStream outputStream)
      {
        return new LZ4BlockOutputStream(new LZ4BlockOutputStream(outputStream,
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor());
      }

      @Override
      public Writer decorate(IOContext context, Writer writer)
      {
        return new OutputStreamWriter(new LZ4BlockOutputStream(new LZ4BlockOutputStream(new WriterOutputStream(writer, UTF_8),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor()),
          LZ4_BLOCK_SIZE, LZ4Factory.fastestInstance().fastCompressor())
        );
      }
    }).build()).build());
guidomedina commented 4 years ago

The result will vary, I tried with a Json with many repetitions (an array of objects) which Smile will compress very well before reaching the LZ4 or Gzip compression, also; for Gzip I just used the standard JDK implementation, I kind of avoid using Apache common compressors where possible, I have had bad experience with them.

howardem commented 4 years ago

Thank so much for posting the code and for your detailed explanation. I'm gonna your configurations and see how it goes!!