aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.17k stars 840 forks source link

Simplify Modeled Message Marshalling #82

Open oakad opened 7 years ago

oakad commented 7 years ago

In v1 API, most request/response model objects were Serializable, so it was easy to persist those in all kinds of client side caches. In particular, s3.model.ObjectMetadata could be persisted as is in client-side s3 object cache (and those are invaluable when dealing with big buckets over high latency links).

In v2, unfortunately, s3.model.HeadObjectResponse is not Serializable, and thus can not be persisted as is - instead, manual copying of fields or reflection/security tricks are required.

millems commented 7 years ago

Is there a reason you need Java serialization? If we provided other means to serialize the messages, would you be able to use them instead of Serializable?

oakad commented 7 years ago
  1. It's not just me :-)
  2. The way I personally use these things, any more or less conventional way to write those objects to streams or convert them to byte arrays will do. But isn't "Serializable" the easiest way to achieve this behavior, especially considering the object is essentially POJO and there are 3rd party libs (like FST) which suck substantially less than the built-in JRE serializer?
millems commented 7 years ago

Sorry, I didn't mean to imply you were the only one with this use-case. We're just interested in your specific use of Serializable. :)

In 1.11 we have supported three means of serialization: traditional Serializable-style serialization, bean-spec serializers like Jackson, as well as internal serializers we generate that are specific to the messages we're sending. We heavily optimize our generated serializers, because it's on the critical path of everyone's application: it's what we use to communicate with the AWS services. We've not traditionally given the same attention to our other methods of object serialization.

We believe it could be possible for our generated serializers to be faster than the bean-style or Serializable-style serialization because they don't have to rely on reflection. Unfortunately we're not sure if that's true in the general case because we don't have a lot of flexibility in our serialized form (usually JSON). We still have some testing to do.

If our serializers do end up being faster in most cases than the Serializable-style or bean-style serialization, we could expose those serializers publicly and let everyone benefit from any internal serialization optimizations we make.

We've not yet decided which methods of serialization we should support. We currently support bean-style serialization in V2 using the method we describe in our developer preview announcement post, in the section on immutability. You can use that as a workaround for now.

We're definitely not against supporting Serializable at this time. We just have a lot of thinking to do, and not including it in the V2 preview allowed us to focus on other features we were more sure about. It is also giving us a chance to see how our customers are using it today. Your example of response caching is a great one.

oakad commented 7 years ago

Suppose, the whole thing amounts to the idea, that an user should always manipulate and store builder objects, whereupon actual model POJOs are only for internal SDK use and occasional response property inspection. Sounds reasonable enough, albeit not quite obvious at the first glance (after all, "builder" pattern conventionally implies one way object construction, so converting a response object back to its builder is rather unusual).

millems commented 7 years ago

I agree, it's definitely an unintuitive way to serialize things. Do you think it would be easier to discover if we add a serialize method (and a static deserialize somewhere) to the model objects? We'd also make sure to include this in the developer guide, in case people were to try to google it or look it up there.

oakad commented 7 years ago

It definitely will be helpful. To share a personal story: I was inspecting the model class via "go to source" IDE option and noticed the serializableBuilderClass method; I understood it was intended for object construction beans-style, but it gave me no hint regarding the actual serialization approach. I also noticed the "ToCopyableBuilder" interface, but again, it was looking more like something internal to the SDK, rather than a pathway to successful serialization. :-)

steveloughran commented 1 year ago

why was this cut? It's not just about what is most efficient, it's about making moving from v1 to v2 easier. The harder migration is, the longer projects will stay with the older version, and the longer you have to maintain it. even if you have a more efficient marshalling mechanism internally, there's nothing to stop you adding readObject() and writeObject() methods to use this.

Can I particularly highlight how much Spark relies on object serialization as a way of passing structures between processes. We can't do that with v2 SDK objects, can we?