Instagram / ig-json-parser

Fast JSON parser for java projects
https://instagram-engineering.com/fast-auto-generated-streaming-json-parsing-for-android-ab8e7be21033
MIT License
1.32k stars 125 forks source link

Add support for serializing/deserializing interfaces #50

Closed jingibus closed 6 years ago

jingibus commented 6 years ago

This PR adds support for serializing objects from an interface reference. Using this, you can serialize out e.g. a list of heterogenous object types.

Using the Public API

Say you have a field that refers to an instance of MyInterface.

@JsonType
public class MySerializableObject {
  @JsonField(fieldName = "my_interface")
  MyInterface mMyInterface;
}

First step is to add a method to the interface to return the type name:

@JsonType
public interface MyInterface {
  @JsonTypeName
  String getTypeName();
}

Then in your implementation of that interface, you yield the type name:

@JsonType
public class MyImplementation implements MyInterface {
  public static final String TYPE_NAME = "MyImplementation";
  @Override
  public String getTypeName() {
    return TYPE_NAME;
  }
}

Then before you serialize, you register a serialization handler with the generated code:

MyInterface__JsonHelper.registerHandler(
  MyImplementation.TYPE_NAME,
  new JsonSerializationHandler<MyInterface>() {
    @Override
    public void serializeToJson(JsonGenerator generator, MyInterface object)
                      throws IOException {
      MyImplementation__JsonHelper
          .serializeToJson(generator, (MyImplementation)object, true);
    }
    public MyInterface parseFromJson(JsonParser parser) throws IOException {
      return MyImplementation__JsonHelper.parseFromJson(parser);
    }
  });

With that, MySerializableObject can successfully serialize and deserialize:

String json = MySerializableObject__JsonHelper.serializeToJson(object);
MySerializableObject deserialized = MySerializableObject__JsonHelper.parseFromJson(json);

Under the Hood

Not seen above is the JSON representation. For this to work, the type has to be represented in the serialized JSON. Not only that, but some ordering must be guaranteed if we want to go with the existing single-pass parse style. This is because we must read in the type first before we dispatch to that specific type's parser.

The only JSON aggregate that guarantees ordering is the list. So interfaces are represented as lists, where the first entry is the type information, and the second entry is the object data:

[
  "InterfaceImplementationUUT",
  {
    "stringField":"testValue"
  }
]
ttung commented 6 years ago

cc: @kangzhang @tkieft

jingibus commented 6 years ago

Almost forgot. Here's a link to example generated source: https://gist.github.com/jingibus/b877e28c376cc64b5b77ac7328e8214c

kangzhang commented 6 years ago

hey @jingibus, thanks for putting up the PR!

The logic looks solid but the API diverge a bit from our current design: we are trying to have more work done in compile time and avoid centralized bottleneck that may not scale well when the number of types increases(e.g. the registerHandler logic).

May I know more about the problem that you are trying to solve so that we can explore the best solution for it? E.g. Is it mostly for heterogenous types or you have some other constraint?

If it's just for heterogenous types, wondering if you have tried using custom serializer & deserializer(which the library already support) and how does it work for you usecase? E.g.


interface Bar {
}

@JsonType
class Foo {
  @JsonField(valueExtractFormatter="BarDeserializer", serializeCodeFormatter="BarSerializer")
  Bar bar;
}
jingibus commented 6 years ago

Thanks for taking the time to consider these changes! Here's a tl;dr version of the problem:

I work on a module that needs to be able to serialize objects that are passed in from client modules. These objects need to store the client module's data, so our module can't refer to their serializers statically.

I'll go into more detail on the problem in a second, but first, to address your concerns:

  1. I do not believe that this can be achieved with static references. If client code depends on our code to store its objects, and we then have a static reference to the client serialization code, a circular reference is created. So I believe that some runtime dispatch is required.
  2. For performance, two points: a. This is not going to perform as well as the existing statically dispatched parse graph. That's just how it is. There are a lot of tools we use that don't perform as well as other tools. I think it's enough overhead to wire up that people will at least not blindly use it, though. b. I am skeptical of the claim that a HashMap with a String key will not scale. Maybe I have more to learn about the kinds of scale folks are putting this library through. If this is a serious concern, though, we can talk about addressing it.
  3. Custom serializers and deserializers do work. That was how I implemented my prototype. This approach is better than that one in a couple of ways: a. This is more elegant. With a custom serializer/deserializer, you have to have a different bit of code for a field reference vs. a Map reference vs a List reference, etc. Here, you define the code for the interface, and everywhere you use the interface it Just Works. b. Other people who are in the same situation I'm in won't have to reinvent this wheel.

Okay, now the longer explanation. I work on a module in an Android app, we can call the module Arthur, that has a serialized object store at its heart. A client makes a request, the request goes into Arthur's store, and at some later date the request is serviced. This is currently based on ig-json-parser.

Each client that depends on Arthur needs to stash their own client-specific data in the store. The way we handle this right now is that each client adds a reference to the data they need inside the store's main model object.

This is causing two problems:

  1. The main model object is getting more and more cluttered
  2. Client logic is started to accumulate in Arthur

To cut Arthur away from the clients, clients have to be able to build their own objects and save them in our store. Right now, that's not possible in a graceful way.

kangzhang commented 6 years ago

Cool, yeah, if you need to avoid the dependency from the model Interface to the model implementation, some runtime registration is necessary.

The current solution ties a bit too close to your current approach(the idea to use runtime binding, the idea to separate model interface and model implementation, the idea to do lookahead parsing). I hope the framework could be flexible on these decisions and allow people to make their own choices. Different approaches could be provided as patterns(e.g. we can provide documentation and examples following your approach) instead of features.

It looks like custom serializer and deserializer can get you almost there, except that we have to duplicate the annotations(and properties) between different reference point of ModelInterface.

To fix the problem, I'm wondering if we could just support interface as JsonType and allow(/require) people to provide custom serializer and deserializer for it. E.g.


@JsonType(
  valueExtractFormatter = "BarHelper.read(jsonParser)",
  serializeCodeFormatter = "BarHelper.write(jsonParser)"
)
interface Bar{
}

This way the serializer and deserializer values don't have to be repeated in different @JsonField annotations.

The runtime binding logic will be moved to BarHelper. If people can afford hard referencing to the implementation class, they can avoid the runtime binding as well.

Similarly, the lookahead parsing will also be optional/configurable.

The other unspoken benefit is performance and cleanness. The Bar__JsonHelper we generate will be virtually a pointer to the actual implementation. Sophisticate post processing tools(proguard/dexguard/redex) can likely remove these classes and functions which means almost zero overhead introduced.

Thoughts?

jingibus commented 6 years ago

You have a good point about configurability. I think that proposal should be easy enough to add to the PR. Would it make sense to respect that attribute everywhere, not just in interfaces? At a glance, I don't see why not.

Now: should there be an out-of-the-box implementation, or not? If the library shouldn't have one, then the quality of this implementation is beside the point. Here's why I think it would be good to have one:

This is a non-trivial integration. This scenario can't be solved by any programmer without at the very least serializing out the type name and reading it in from JSON. They don't have to do it this way, of course, but they must decide on an object format, they must figure out how to use Jackson's JsonGenerator to save it out, and they must figure out how to use JsonParser and the JsonHelper API style to read it back in. They must also implement some kind of type dispatch.

So this is an integration that requires reaching much further under the hood and building more moving pieces than you need for, say, enums. The API will be useless without an implementation, and the example code will have a lot of boilerplate. The implementation can always be omitted if valueExtractFormatter etc are provided on @JsonType, of course.

Re: this implementation, one minor point: I think this is just a terminological issue, but for the sake of clarity: there is no lookahead in this implementation. Dispatch is implemented the exact same way the current parser handles field parsing: by reading in a string from the JSON stream, and then using that string to choose the parsing code to switch over to. The difference is that this format is implementation-specific, while the field format is a JSON standard.

kangzhang commented 6 years ago

Ah, sorry that I overlooked the sample JSON payload and thought you were using lookahead parsing. You are right, it's still streaming!

I might be biased since I know how the code generation works internally, but it does feel like some thing that can be reused thru patterns and helper classes easily. I might be missing something, but I'm not sure if code generation logic is necessary to make all this possible. E.g. following code could almost achieve the same thing?

// in priviate module, which we don't want people to reference to
@JsonType
class BarA implements Bar{
}

@JsonType
class BarB implements Bar {
}

@JsonType
class BarWrapper{
  @JsonField("fooBar")
  BarA mBarA;

  @JsonField("barFoo")
  BarB mBarB;

  Bar getBar() {
    if (mBarA) {
      return mBarA;
    }
    return mBarB;
  }
}

// in public module
@JsonType(
  valueExtractFormatter = "BarWrapper__Helper.read(jsonParser).getBar()",
  serializeCodeFormatter = "BarWrapper__Helper.write(jsonParser)"
)
interface Bar{
}

The other feedback is that I still think the proposed approach is too tied to your specific use case. Maybe a few examples(in my imagine) could illustrate things better:

1) when there is an 1 to 1 mapping between interface and model, and people just want to hide the model implementation.

// package visibility, external classes can't access
@JsonType
class BarImpl {}
@JsonType(
  valueExtractFormatter = "BarImpl__Helper.read(jsonParser)",
  serializeCodeFormatter = "BarImpl__Helper.write(jsonParser)"
)
interface Bar{
}

2) when people wants to use lookahead parsing to handle heterogenous types.

In these cases, the payload structure will be different from your proposal here.

I think supporting interface is going to be useful and will unblock all these use cases. Instead of supporting all of them in ig-json-parser, I'm more leaning towards letting developers to decide for now. If we do see one or many use cases to appear a lot, we can revisit and build support in ig-json-parser? Happy to discuss more.

jingibus commented 6 years ago

Interesting. From a data standpoint, for my scenario, this does work:

package com.uber.android;

@JsonType
class CarRequest implements Request {
}

package com.foursquare.android;

@JsonType
class RestaurantRequest implements Request {
}

package com.requestqueue;

@JsonType
class RequestWrapper {
  @JsonField("fooBar")
  CarRequest mBarA;

  @JsonField("barFoo")
  RestaurantRequest mBarB;

  Bar getBar() {
    if (mBarA) {
      return mBarA;
    }
    return mBarB;
  }
}

package com.requestqueue;

@JsonType(
  valueExtractFormatter = "RequestWrapper__Helper.read(jsonParser).getBar()",
  serializeCodeFormatter = "RequestWrapper__Helper.write(jsonParser)"
)
interface Request {
}

If this were a public library, then obviously this would break. But it's not, so it can probably work.

There's one more thing the dynamic implementation enables that this does not, though, which is freedom for callbacks. CarRequest cannot have any code that calls back into the requestqueue system here. This would be very nice to have for my scenario, as it opens up the API design a bit.

I'm sold that your proposed API is useful enough on its own. One last counterproposal:

My implementation can be modified to avoid code generation and hook into your proposed mechanism. The result would be two files: a DynamicDispatch<T> class that would contain the code that's currently generated, and a TypeNameProvider interface with at getTypeName that would take the place of the JsonTypeName attribute. And then it could be wired up like so:

public class RequestHelper {
  public static final DynamicDispatch<Request> DISPATCH = new DynamicDispatch<>();
  ...
}

@JsonType(
  valueExtractFormatter = "RequestHelper.DISPATCH.read(jsonParser)",
  serializeCodeFormatter = "RequestHelper.DISPATCH.write(jsonParser)"
)
interface Request {
}

I'll go ahead and make the change to strip out the existing generated code in favor of the valueExtractFormatter/serializeCodeFormatter API.

jingibus commented 6 years ago

The new approach is sufficiently different that I'm going to close this PR. I'll have a new one up soon.