Update bulk requester to support optimized request chaining

esbtools / event-handler

Notification in, document out.

GNU General Public License v3.0

3 stars 6 forks source link

Update bulk requester to support optimized request chaining #22

Closed alechenninger closed 8 years ago

alechenninger commented 8 years ago

TODO:

[ ] Take another look at testing strategy for this

kahowell commented 8 years ago

Talked about this some IRL, basically, the one thing I really want to see is the execution of the requests to be more explicit, rather than a side-effect of calling get on the Future.

alechenninger commented 8 years ago

I think we can rewrite the processing of document events, notifications, and messages to make the batching more explicit (I assume you mean batching since execution of any requests should be clear from calling 'process' or the other async APIs which imply (and are explicitly documented) to start background processing.) I think there could be some benefit to that.

However, please consider that the current pattern has its merits. From the outside, returning just a Future representing some computation is quite flexible. This completely decouples the caller from how these actions happen. For example, it's allowed us to implement rather complete request batching and no components are impacted by this behavior. This is the definition of high cohesion, low coupling. All of the batching logic is completely deletable code: you could transparently replace it with something that did simple serial requests, or not use any of it at all and do operations synchronously and return an already completed Future. None are violations of the API. Of course an API returning a Future does imply it's doing some background work, so you may treat it a certain way because of that. I would consider that a downside of this approach, but nonetheless synchronous usage still remains a 'correct' program and so this flexibility and isolation of concerns should be noted.

As far as making batching more explicit, it would likely make batching easier to implement since batching would be a core concern of the API, and implementations could decide to do each operation in its own thread or do real 'batching' or whatever. I think that has its own merits. I originally, before even going down the Future route, was considering a domain model that applied now might look something like this psuedo-code API:

DocumentEvent.lookupDocument -> Operation<?>
Notification.toDocumentEvents -> Operation<List<DocumentEvent>>
Message.process -> Operation<Void>

Where Operation was a type that, individually, encapsulated some synchronous computation presumably involving a remote call. Then, you could potentially batch these together, explicitly, with either APIs on Operation itself like boolean canBatchWith(Operation<T>) and BatchOperation<List<Batched<T, U>> batchWith(Operation<T>) or perhaps some other injected type like a "Batcher." Notice maintaining type safety however gets a little crazy, because you need the type of the operation result (T) and the type of the source of the operation (U) so you could keep track of the Message or DocumentEvent or Notification that the Operation originated from. This is because an Operation's result is potentially only meaningful in context of its source (ie I need to know which Message failed, which DocumentEvent's status to update, etc.).

It's worth noting that even in implementing that, code that returned an Operation that was able to be batched would likely still require something like callbacks in order to deal with the fact that it was decoupled from when requests are actually made, and therefore would have to be able to provide, in advance, what to do with responses it does not yet have. This is where the async model fits quite naturally (initially, much to my delight!)

I'm sure there is a way to sort out some of the mess of that model and I'd be willing to explore that if you think it's worth it. However, I originally decided to go with some kind of Result type which could be computed lazily (or not, this is transparent to the caller and the caller does not really care) because this kept the association with a result's source clear, and still allowed for any amount of flexibility in the backend WRT how requests actually happened as is this is not a concern of the caller. From there we said, "Hey that's a lot like Future, let's just use that," and here we are.

So, either way is fine with me. Let me know what you think.

kahowell commented 8 years ago

From the outside, returning just a Future representing some computation is quite flexible.

I agree, and I don't necessarily think that the higher level interfaces (especially those created prior to this PR) are poor. On the contrary, I think they're pretty good.

This is the definition of high cohesion, low coupling. All of the batching logic is completely deletable code.

Sure, and I do agree that this is a good quality of the current design. What I dislike is that the batching behavior, etc. is a little surprising. I'd expect to have explicit demarcation of batches, or configurable batch sizes, rather than having the batch controlled by a side of effect of a call to LazyPromise.get().

I don't think we necessarily have to completely rewrite. One change, I'd like to consider though is instead of expressing the contract for inbound as each Message has a process method, consider expressing as an interface for processing a Collection of Message (void process(Collection<Message> messages)or similar). This would allow batching behavior to be defined in that method, which could be cleaner I would think. Additionally, we could create abstract classes with batching behavior built-in. It is difficult to express batching in our current set of interfaces (the high-level ones), and my intuition is we can tweak those to help get a cleaner solution. If you'd like I can sketch out what I think would be ideal, though this will obviously take some time.

Regarding the Operation idea, etc., I think they are valid solutions to explore, but my intuition is we don't have to go that far from this PR to arrive at something cleaner.

alechenninger commented 8 years ago

Let's talk about an API like process(Collection). The main feature of the current design I want to keep however is that from an implementation point of view, it matches the domain naturally and has very good separation of concerns: you write a MessageFactory which parses your incoming documents into Messages, and those contain the processing code specific to that document. That is very natural, modular, and testable. You write code to process your message, and don't have to worry when your requests are actually done. I think we could keep that aspect with a process(Collection) API, still, but I'm worried we wouldn't end up changing as much as you would like.

I'm still not clear on what is surprising about the batching behavior. It is clear that the requester handles the lifecycle of requests, and you use a BulkLightblueRequester whose entire purpose is around batching requests. Requests being batched when Futures are resolved is the defining feature, so I, while admittedly biased as its implementor, still am having trouble as seeing why that is so surprising. In any case, if we can settle on something that makes everybody happy, I'm all for it.

kahowell commented 8 years ago

I'm still not clear on what is surprising about the batching behavior. It is clear that the requester handles the lifecycle of requests

It's honestly just LazyPromise.get having the side effect of triggering queued requests that bothers me. I'd rather have BulkLightblueRequester expose doQueuedRequestsAndCompleteFutures, and perhaps a method to check if the request has queued requests (boolean hasQueuedRequests?).

If we have a process(Collection) API, then this can still hidden from the caller and the implementor. Following the above, the process(Collection) could have two implementations:

Non-batched (mostly*). Call message.process, followed by doQueuedRequestsAndCompleteFutures, until hasQueuedRequests is false, for each message.
Batched. Call message.process for each message. Call doQueuedRequestsAndCompleteFutures until hasQueuedRequests is false.

*Technically, may batch a few if multiple requests are made during message.process...

Another thing that might be useful is to add a parameter to doQueuedRequestsAndCompleteFutures to control the batch size. I don't think it's really necessary yet... but I can see some odd cases where you fan out and end up with very, very large bulk requests (if we even care)?

alechenninger commented 8 years ago

It's honestly just LazyPromise.get having the side effect of triggering queued requests that bothers me.

Why?

I ask not to be stubborn--truthfully!--but because any alternative is also going to have its tradeoffs, and because I think the complaints around this PR have confused the "promise" term and future callback chaining with the laziness of the future implementation (case in point, the conversation first started around PromiseOfPromise and took a while to leave there). I want to be clear about where the real problems are, because the promise and callback chaining I believe to be a misdirected issue, more about naming types and APIs more clearly, since the core concepts are extremely common and invented long before this PR of course in Guava, the JDK itself as of 8, and to some extent many other languages. I want to talk about the laziness bit specifically.

So about the laziness, I'm not clear on how making a more complicated to use Requester would be a better tradeoff. In other words, you need to know nothing about the current implementation in order to use it correctly. With your suggestion, choosing a bulk requester means more than just "new your requester and use pass it to your message implementations." The contract is more complicated (crucially, it is implementation specific, making the interface misleading). Additionally, we're requiring the use of some kind of Requester because that's the only way process would work as far as I can tell. On the other hand, laziness is always a little odd to those not fluent in functional programming especially, but its contract is simple to use, and purely outlined by interfaces alone: you cannot compile an incorrect usage. (And although the futures produced will never be truly parallel they are concurrent.)

In the meantime, I'll start playing around with a few things to see what shakes out.

Another thing that might be useful is to add a parameter to doQueuedRequestsAndCompleteFutures to control the batch size.

This would be easy to add to its constructor which I think would make more sense anyway as if there is a max batch size (which I agree there should be some max), it's unlikely to change per batch, and this keeps an implementation specific concern out of the current top level interface for Requester.

Thank you!

alechenninger commented 8 years ago

Additionally, updating the requester in that way will also require reworking how DocumentEvent and Notification are currently processed as well (which relied on the existing laziness of the bulk requester futures).

alechenninger commented 8 years ago

Some alternative APIs (psuedo code):

Scheme 1: Return a thing representing a potentially batchable operation (already discussed a little bit)

DocumentEvent.lookupDocument() -> Operation<DocumentEvent, ?>
Notification.toDocumentEvents() -> Operation<Notification, List<DocumentEvent>>
Message.process() -> Operation<Message, Void>

// then either use batching impl decided by impl of operation
Operation<T, U>.batchWith(Operation<T, U>) -> BatchedOperation<T, U>

BatchedOperation.perform() -> List<Result<T, U>>

// ...or have a separate 'batcher' or something of the sort that does the ops
// This Batcher has to have knowledge of the operation types it is performing,
// so there is some room for error in using a Batcher which doesn't work with
// your Operation impls (which come from your DocEvent/Notification/Message impls)
Batcher<T, U>.addToBatch(Operation<T, U>)
Batcher.performOperations() -> List<Result<T, U>>

// Where Result is a type with an API like
getResult() -> T
getSource() -> U
isFailed() -> boolean
getException() -> Exception

Scheme 2: Accept a thing to record batchable operations

DocumentEvent.lookupDocument(Batcher<DocumentEvent, ?>) -> Void
Notification.toDocumentEvents(Batcher<Notification, List<DocumentEvent>) -> Void
Message.process(Batcher<Message, Void>) -> Void

Batcher<T, U>.performOperations() -> List<Result<T, U>>

Scheme 3: Accept a thing to record batchable operations, return a future result I don't like this idea because the Results here would have to throw an exception if their operations were not performed yet, but otherwise it's like scheme 2 except performOperations() returns nothing and each source's method returns a Result<T>

alechenninger commented 8 years ago

Here is my ask: changing the laziness will require refactoring how all event types are processed, and it's still not clear to me it is a net win. In the interest of time, how about we continue this discussion with the team and address with refactoring later on? The consuming API will still use callbacks, and so shouldn't be terribly affected by some kind of rework like suggested above.

alechenninger commented 8 years ago

For the record, the laziness was implemented and described in #1 , which you reviewed

kahowell commented 8 years ago

I ask not to be stubborn--truthfully!--but because any alternative is also going to have its tradeoffs, and because I think the complaints around this PR have confused the "promise" term and future callback chaining with the laziness of the future implementation (case in point, the conversation first started around PromiseOfPromise and took a while to leave there)

I have multiple issues with this (all of which I've discussed previously), but I think the most pressing is the surprising (to me at least) get() side-effect. Also, I feel most of my issues with this would go away if the overall design was tweaked a little.

I started with the PromiseOfPromise construct because this was the first sign to me that this solution is overly complicated. I don't feel misdirected is a fair assessment, but I will admit I perhaps haven't communicated all of this with as much clarity as it deserves.

On the other hand, laziness is always a little odd to those not fluent in functional programming especially.

This statement comes across as dismissive towards those less familiar with functional programming, and I think the use/understanding of the solution shouldn't require fluency in functional programming; I think that's a unnecessarily high ask. I think to implement an event handler, and even to understand this library, one should only need understand basic Java. I think once you start introducing concepts like promise, it quickly goes outside the realm of basic Java. On the other hand, I think some level of extra complexity is okay, especially given that batching is a hard problem to solve. So, I've focused my review comments on reducing the outside-the-norm language/design (ex. PromiseOfPromise, Promise extending Future). If time permitted, though, I'd rather us try to avoid these concepts altogether though (for the sake of keeping it simple).

changing the laziness will require refactoring how all event types are processed, and it's still not clear to me it is a net win.

Fair enough, though I think we should strive for a simpler design/solution in the long run; I don't think refactoring how all event types are processed is necessarily a bad thing.

the laziness was implemented and described in #1 , which you reviewed

Fair point, I should definitely have raised concerns with it then. Honestly, I did not spend as much time as I should have in that review, in that context it did not seem as bad as it does in this context, and I now hope that we'll consider it a reversible decision, because I'd like to reduce some of the complexity of the event handler if possible.

As to your possible API designs, I still feel that we can design something with less complexity. I can discuss in more detail later but I'd like to pursue the process(Collection) interface you mentioned. I believe it's certainly feasible to encapsulate the batching details in an implementation of this interface, and that seems a better approach IMO. I think it's not unreasonable to change some of the interfaces involved to support batching, and then if batching isn't needed or possible in some cases, implement as a batch of size 1.

In the interest of time, how about we continue this discussion with the team and address with refactoring later on?

Sounds good.

alechenninger commented 8 years ago

I started with the PromiseOfPromise construct because this was the first sign to me that this solution is overly complicated.

IMHO I think this biased you for the rest of the review, but this is just my opinion. In hindsight, maybe it would have been better to react a little slower, and try to read through everything before coming to a conclusion, since it is easily demonstrated that having an async callback which returns another future is a very, very common thing. I probably just named things poorly that dealt with that situation.

As for complexity, to me, it is maybe a little inappropriate make a judgement about complexity simply by seeing a construct such as that, which is ultimately so common. To me, complexity is a much more involved measurement about coupling between components and contracts which are tricky to uphold correctly. Complexity is about the Law of Demeter and SOLID principles. This solution arguably holds up okay with all of those things, which is why it can be frustrating to me to hear "this is too complex" when it didn't seem like much time was really spent with the solution. And since it's feedback I've heard before, it scares me that it may start to become somewhat of a self fulfilling prophecy (the reviewer's) due to confirmation bias. I'm not saying there isn't any complexity in the solution, of course.

This statement comes across as dismissive towards those less familiar with functional programming, and I think the use/understanding of the solution shouldn't require fluency in functional programming

Eek, text is a poor communicator... or I am very misunderstood. Heh. I agree completely. I am not being dismissive, that was exactly my point. I meant that statement as an honest critique of all lazy constructs, which are uncommon outside of functional programming, which is a programming style less common to Java developers.

Fair enough, though I think we should strive for a simpler design/solution in the long run; I don't think refactoring how all event types are processed is necessarily a bad thing.

Agreed. As I mentioned before I went back and forth with something like the Operation API, but there are other complexities with that approach. If something simpler is out there, I'm all for hunting it down!

alechenninger commented 8 years ago

Working on some clarifying improvements to this in light of all the feedback, want to get your thoughts please:

Rename Promise -> FutureWithCallbacks ... still extend Future. This is mainly to keep implementation simple. If you don't like extending Future, I would maybe keep the Promise name, not extend Future, but add an asFuture method to get back into the Future domain. Another alternative may be to only allow one callback layer to be added per promise, and have that method return a Future (this is how it used to work before this PR)... not sure off the top of my head if that would work entirely now however.
Rename then and friends to transform a la Guava (perhaps transformSync, transformAsync)
Rename PromiseOfPromise -> NestedFutureWithCallbacks or something of the sort
Change process api to return Future<Void> to be a little more intention revealing and to give compile time feedback in the form of errors when consumer erroneously returns a Future<Future<?>>, and also add a callback method to make conforming to this API simpler (something like, transformAsyncIgnoringReturn)
Check for null before unwrapping nested futures (aka promises)

Anything else you would like to see aside from larger refactoring?

kahowell commented 8 years ago

Complexity is about the Law of Demeter and SOLID principles

Perhaps "complexity" is not the most appropriate term for what I'm trying to express then... What I'm really trying to express is the surface area of the abstractions along with the language used to express those abstractions; I'd like to keep these as simple as feasible. I'm talking about "it" from a less formal, more human point-of-view. What I was discussing was more along the lines of "how easy is this to understand"?

Rename Promise -> FutureWithCallbacks ... still extend Future.

Sounds good.

Rename then and friends to transform a la Guava (perhaps transformSync, transformAsync).

Sounds good.

Rename PromiseOfPromise -> NestedFutureWithCallbacks or something of the sort

I still would like to get rid of this if possible. Can you please make your proposed changes and leave a TODO(khowell) investigate getting rid of this in order to simplify?

transformAsyncIgnoringReturn

Not a fan of this... Would rather just trust implementer to get it right without help.

For future reference, I'm going to try to summarize my feedback (so we don't have to read the whole conversation if we revisit at some point if we want) (checking off what's been addressed):

[X] language - especially around promises and futures
[ ] redesign to remove need for PromiseOfPromise/NestedFutureWithCallbacks
[ ] overall design of batching, would like revisit to make batching more explicit (instead of side-effect of the get of a Future**

\ important:

changing the laziness will require refactoring how all event types are processed

I didn't/don't expect that all my feedback is always acted upon; I think it's perfectly reasonable to not redesign overall batching since the scope of such a change is large, and I'm okay to disagree on the PromiseOfPromise/NestedFutureWithCallbacks thing.

alechenninger commented 8 years ago

Sounds good, TY!

alechenninger commented 8 years ago

Addressed the language bit, going to take another look and see if there is another way to elegantly deal with nested futures.

alechenninger commented 8 years ago

There's nothing coming to me that isn't objectively worse for removing the NestedFuture* bit so I added a TODO, since I think if we can come up with something more explicit for batching the need for all the Future stuff goes away so maybe let's just do that instead.

As far as the 'ignoring return' bit, it's not a huge deal but I think it improves readability:

    return requester.request(someStuff).transformAsync(responses -> {
       // do some stuff with responses...
       // make another request...

       return requester.request(insertOrUpdateStuff)
                .transformSync(responses -> null);
    });

    return requester.request(someStuff).transformAsyncIgnoringReturn(responses -> {
       // do some stuff with responses...
       // make another request...

       return requester.request(insertOrUpdateStuff);
    });

FWIW I liked how then APIs read better than transform and I actually think people may be more confused by the more technical terminology, but I leave that call up to you.

Other than that I'd like to add a few more tests and ship it pending your feedback.

Thanks!

kahowell commented 8 years ago

Yeah, even though .transformAsyncIgnoringReturn is a bit wordier, I kinda like that it doesn't have to have another closure. I think we've made good progress, and agree with leaving the batching as-is for now, with an intent to revisit in the future.

As to then, I think in this case, since we're differentiating between ignoring the return vs. not, it is better to use different terminology. If there were a clean way to avoid that, I'd be okay with sticking to then (though I still prefer not to use promise terminology in Java).

Merge at will! :-)

alechenninger commented 8 years ago

Thank you!