Maximum inlined data size is 1024 bytes by default. Good? Bad?

petersilva commented 5 years ago

This is a reference to #3 ... separate from implementation concerns, inlining large data will have a severe effect on broker performance. so this Issue will try to document a consensus value.

petersilva commented 5 years ago

really don't want this to be big... I'll say 1000 bytes.

josusky commented 5 years ago

I think that 1000 bytes must be enough for everybody. On the other hand, if an institution will publish too big messages then none will subscribe to them. At the end such institution will harm itself because clients will poll the directory tree (and generate unnecessary load on the server). So the size will organically self-regulate :-)

petersilva commented 5 years ago

This is implemented in the wmo_mesh example now. --inline option, with --inline_max to do experiments with maximim inline message size.

petersilva commented 5 years ago

self-regulation idea is a good one. On one hand, including the data in the payload saves time for small bulletins. On the other hand, if one is

subscribing to two sources for all products, then one will only be using the products that come from the first one, and all inlined data that does not arrive first is wasted (would not have been downloaded if it were not inlined.)
server side filtering possible with MQTT (or AMQP) is fairly coarse, and one must, in general request more messages than one genuinely intends to download. These other messages are filtered out by client side reject clauses. so how many messages are downloaded, only to be rejected on the client side.
inlining worsens performance in a LAN where the roundtrip time is negligeable, the optimization is negligeable, likely drowned out by the reduced message processing rate. In the LAN case using AMQP one wants to spread the requests out to many instances, which is done more quickly without inlining. in Sarracenia, SFTP sessions are maintained, so while there is a round trip for the get request, one does not pay connection establishment on each transfer.

petersilva commented 5 years ago

on the current feed from hpfx.collab, I upped the maximum to 2048, to get more files inlined, provides more frequent demonstration.

petersilva commented 4 years ago

@davidpodeur brought up an interesting case:

relatively high speed transfer, but very long latency ( satellite link )
regardless of how many instances run in parallel, the performance is much worse than creation of periodic buckets as tar files, and sending those.

one would need fify or more parallel transfers to catch up with tar files. In this instance, a much higher limit for the size of embedded data makes sense, or an extended message type that refers to a tar bucket.

josusky commented 4 years ago

Well, we definitely cannot set one hard limit to fit all use-cases. It needs to remain configurable. We can just recommend - something like: "Keep it in kilobytes unless you are sure that your use case will benefit from a higher limit. Avoid going to megabytes unless your data is distributed only to a restricted group of systems that can cope with it."

MetPX / wmo_mesh

Maximum inlined data size is 1024 bytes by default. Good? Bad? #4