Open eckter opened 3 months ago
A few remarks:
curl
& co., RabbitMQ UI, others ?)If I understand how this works: compressing the responses is part of the http protocol. It can and should be optional.
curl
has a --compressed
flag that automatically decompresses the content. By default it doesn't include Accept-Encoding: gzip
in the header, so it shouldn't get a compressed response.
What I had in mind with this issue is to just toggle some library flags that should make it work transparently and out of the box, this is not about adding a whole new homemade layer on top of all requests. The initial scope was just http requests.
Compressing the content of rabbitmq and redis may or may not be an "out of the box" thing as well, but that seems less likely. I'm not sure how we should handle it if it's not. We could consider doing it manually, but I'm not sure it would be beneficial.
RabbitMQ transports arbitrary bytes payloads, no out-of-the-box solution exists - but adding a byte compression step should be simple (and AMQP 0.9.1 does have a content encoding message property if we want to use it). There may be performance gains (json compresses very well), but if we go that route we could (also?) use a binary protocol (self-describing or not eg. MessagePack/BSON instead of protobuf).
The main problem today is the size of messages sent to core. This requires a fairly high ram limit for rabbitMQ instances (2GB).
I haven't observed any major problems on the redis side. Maybe the performance can be improved :shrug:
Trying to activate the parameter @khoyo is talking about seems like a good start. However, planning to use a binary protocol like protobuf or bson doesn't seem relevant to me until we've validated the performance problems associated with payload size.
Notes for the 09/16 workshop:
For HTTP requests, we should check whether libraries can handle compression natively. If they do, great (we should still measure performances). If they don't, we can just drop it. There's likely not much to be gained there (edit: except for the infra loading process).
(side-note: large payloads in rabbitmq may be an issue in itself)
(question: does osrdyne read the infra id from the payload? (apparently not))
Adding compression there would fix issues, but native support is limited. There's a content-encoding
attribute but it's apparently not used by rabbitmq itself, it's up to us to handle it.
The issue is about debuggability. We don't want to have unreadable payloads (e.g. in the rabbitmq interface).
We could add a parameter (e.g. env variable) to add compression when sending something to the queue. When reading, we'd rely on the content encoding.
It seems to have the same issues and possible solutions as RabbitMQ, but apparently redis works fine as it is with large paylaods. We can ignore it for now and focus on RabbitMQ, and then maybe apply the solutions we've found to redis.
So the main issue is probably that we're not supposed to put large payloads in there. Could we find other solutions?
The largest payload in rabbitmq is the stdcm request input. We could remove the timetable data, and have core fetch it when receiving the request. But that would make debugging more tedious (we can't just reuse requests with no other context)
However, planning to use a binary protocol like protobuf or bson
Messagepack should be superior to bson for our usecase - and way easier to use than a non self-describing format like protobuf
The issue is about debuggability. We don't want to have unreadable payloads (e.g. in the rabbitmq interface).
Did we test that the management interface doesn't decompress the data if we give it the correct content-encoding? (if not, maybe we could add that in?)
Did we test that the management interface doesn't decompress the data if we give it the correct content-encoding? (if not, maybe we could add that in?)
Apparently it doesn't work, it displays a base64 string to be decoded then decompressed
echo -n 'H4sIAJD952YC/6tWykvMTVWyUlDyys/IU9JRUEpMB3GNDWoBtW1YHhsAAAA=' | base64 --decode > test.gz ; gunzip test.gz
Description
Services communicate with each other using json data in plain text. These payloads are sometimes quite large (> 10 MB). Especially for
editoast -> core
requests (stdcm (50MB) or infra loading (150MB)).HTTP protocol can let us compress data using various compression methods (gzip, ...). Libraries can generally be configured to enable this quite easily.
We should try to use compression and see what happens.
Technical design
Definition of ready
Technical
General