Batched dispatch can result in excessive RAM usage

ramonsmits commented 8 years ago

In v6 we now have batched dispatch. If you have a handler that processes a large record set and then sends message for each row then all these message will be stored in RAM first. This can cause excessive RAM usage especially when also storing additional data.

Using send option immediate dispatch isn't on option as that can result in more-than-once delivery in case of a failure and would require the usage of the outbox too.

Having the 'V5' behavior where message were pushed to the transport during processing is a better solution for such operations.

Besides that batched dispatched is likely to result in increased latency as message are pushed to the transport after the operation completes.

Supporting the previous behavior too and/or having a flush option to flush the current aggregated set of messages to the transport would help in resolving this an keeping both RAM allocation and latency to a minimum.

andreasohlund commented 8 years ago

So we need a "immediately dispatch to transport but do enlist in the transaction" option when sending?

This can be added in a minor version so I don't see this making it into v6

danielmarbach commented 8 years ago

Not sure the above statements are entirely correct given that we have ASB that supports batching of requests on the SDK.

I also don't fully understand the following argument:

Having the 'V5' behavior where message were pushed to the transport during processing is a better solution for such operations.

and how does it increase latency?

bording commented 8 years ago

So we need a "immediately dispatch to transport but do enlist in the transaction" option when sending?

Wouldn't MSMQ and SQL be the only ones that could benefit from something like this anyway?

I'm not sure that using more memory is actually a problem? Wouldn't this just need to be something you provisioned your hardware to accommodate?

ramonsmits commented 8 years ago

@danielmarbach Assume we are processing a message that is sending 1,000,000 messages. In V6 delivery of those 1,000,000 messages starts after all handlers complete while if we didn't have batched dispatch here then after completing the handlers we would already be done.

Message delivery to the transport would already be happening during message processing instead starting delivery after all handlers are completed. That is the increased latency.

@bording That is correct, although ASB also supports SendsAtomicWithReceive but I don't know if it would benefit from this. Maybe the @Particular/azure-service-bus-maintainers know this.

yvesgoeleven commented 8 years ago

in ASB V6 buffers up in memory as well, as IEnlistmentNotifications. In V7 the asb SDK does something similar behind the scenes when in SendsAtomicWithReceive. Furthermore ASB SDK is limited to 100 messages in a batch if wrapped in a tx scope anyway. This problem has always been there potentially, never seen it in practice

timbussmann commented 8 years ago

Assume we are processing a message that is sending 1,000,000 messages

I'm having a hard time coming up with a realistic scenario which would land anywhere even near factors of that number of messages.

I'm with Brandon here, is this really something we need to address besides maybe some transport specific limitations as mentioned by Yves?

danielmarbach commented 8 years ago

I think we can close those unless @ramonsmits has more really compelling arguments not to.

ramonsmits commented 8 years ago

I find it strange that because it doesn't fit 'all transports' it is decided to always enforce batched sends. Yes, this only benefits SQL and MSMQ why don't we allow optimizations for these?

I think these are compelling reasons to be able to disable batch sends and especially in the context of SendsAtomicWithReceive

Reducing RAM usage when sending large message in bulk
Removing latency in some cases as messages can be written to the transport immediately, no need to wait until all messages are created.
Ability to use TTBR on MSMQ with ReceiveOnly as each send/publish would be dispatch 'isolated'.
Immediate dispatch isn't good alternative if you want SendsAtomicWithReceive.

I'm having a hard time coming up with a realistic scenario which would land anywhere even near factors of that number of messages.

Maybe 1,000,000 is exaggerated to do in one go but what about daily imports in large batched of 10k, 25k of items? Never build such an integration before?

timbussmann commented 8 years ago

batching also has advantages in the way, that it provides some very limited "transactional sends". E.g. when your handler throws an exception mid processing, no messages are going out instead of half the messages, I'd argue that this is usually preferable.

How about raising a feature request to allowing to disable batching? I don't the see the RAM usage as strong enough reason to make these tradeoffs.

andreasohlund commented 8 years ago

batching also has advantages in the way, that it provides some very limited "transactional sends". E.g. when your handler throws an exception mid processing, no messages are going out instead of half the messages, I'd argue that this is usually preferable.

👍 the batching removes the risk of ghost messages on transports like Rabbit and ASQ so I think its a good default

Not against adding an option to turn it off optionally

Scooletz commented 8 years ago

☝️ What @andreasohlund said. It shifts ASQ a bit closer to all-or-nothing transactional behavior, making ghost messages much less likely to occur.

SzymonPobiega commented 8 years ago

I think the issue here is not the amount of memory allocated but rather the way we allocate it. If we had a reusable buffer pool that is used for all serialize operations (including the headers), we could easily store 1000's of messages there without putting any pressure on the GC.

Also, when sending large numbers of messages, they tend to be very small in size (e.g. containing ID of work item to complete). In such case it is likely that the headers consumer much more space then the actual body. That means we should address the header management problem.

Scooletz commented 8 years ago

I think @ramonsmits meant overall RAM usage, not GC related problems @SzymonPobiega I do agree though that in some cases turning OutgoingMessage.Body into a buffered thing could be beneficial, whether it's a chunk based stream or an ArraySegment<byte> later returned for the future use (I know, consider the padding etc etc)

ramonsmits commented 8 years ago

Created a new issue #4174, closing this one.

yvesgoeleven commented 8 years ago

Reopened this one, I just ran a performance test that used ReceiveOnly mode + batch receive + batch send (1000 msgs per receive as the ASB batch size has no limits in this mode). It goes really fast, but boy does it consume memory, 1 endpoint easily eats 950MB of ram (it's not leaking, it cleans up, just stays high).

yvesgoeleven commented 8 years ago

Also: looks like Node<TransportOperation> (and it's members) is a really big object? Between my snapshots (7 seconds apart, the total size increased 2.5M bytes in size (+63.000 instances)

danielmarbach commented 8 years ago

I think it doesn't make sense to reopen this one. I think we should have a closer look together at overall memory consumption of the transport (excluding the SDK) and then for future version start tackling the reuse of body buffer in the transports etc.

yvesgoeleven commented 8 years ago

@danielmarbach :+1:

Particular / NServiceBus

Batched dispatch can result in excessive RAM usage #4077