Closed petersilva closed 4 years ago
It is observed that base64 encoding all payloads will mean that ASCII data will not be readable in the stream. Perhaps have two fields:
"content-encoding" : "utf-8", "content" : "the actual data goes here"
jan was saying that having the content readable is of marginal value, and it simplifies things if we just always base64 encode the body... I like having some readable... but what do others think?
Implemented in both scripts.
content : { "encoding": "utf-8", "value": "encoded file content" }
triggerred by use of --inline option. for mesh_pub.py, the script will do the right thing.
for mesh_peer.py the meaning of --inline is:
so it converts a stream that arrives without inlining to one with.
There is also an --encoding option, with choices: text (aka utf-8), binary (aka base64) and guess (which uses a python mimetype library to guess if it is a text file, and if isn't then choose binary encoding.
So far on the test feed it blows up on SFUS41 KWBC... which we thought were text files, but it turns out they are encrypted lightning data ?? I'm going to have our upstream rename the files to .bin ... we'll see how many of those we hit...
I'm still not sure if the original proposal of base64 all the time is the right way to go. I prefer the guessing currently implemented, and like seeing the contents (of inlined text messages) when available... but ... what do others think?
in email David Podeur raised the issue of compression. To me that is just another encoding. Current implementation just has "utf-8" and "base64", We would need to add something like "gzip-base64" ... or do we just switch to using that all the time? The thing is, I only see this being useful for small messages, and in small messages the size of the JSON message likely is so significant that compressing the payload doesn't change much in terms of overall bytes on the wire...
The other thing is that formats like png, jpg, or GRIB, may include compression in them, and applying compression may make them bigger than when we started, and we pay for the privilege with a lot of cpu. It´s not clear there is a generic way to do this that makes sense for the general case. would need for people to study... the basic inlining has a method that supports adding other encodings, perhaps this issue should be closed as complete, and the addition of encodings including compression would be a separate issue?
Can we run a few tests on some sample messages to assess compression ratios and times?
I could swear someone has done that already... I thought Yves would know... well, if people want to do that, what algorithms need to be assessed, just one? xz? Who standardizes such things? normally we want to ensure we are using a standard algorithm defined by IETF, ISO or some such. I'm not aware of how that works for compression algos.
opened separate issue #9 to continue.
OK, v03 with inlining is implemented on hpfx.collab and other people are already downloading it, so this is done. If people want a version that works: v2.19.03b5 is good.
There is a question about whether there are line feeds in the base64 encoded stream. At one point I thought there were. inspecting samples from hpfx.collab. It looks like they are already gone, I just forgot it changed. So I guess it is good as-is.
The ET-CTS wants to include the file data for small files in the JSON message. The proposal is to base64 encode the data so that binary files will not be corrupted.
Message broker are notoriously challenged by large messages, one has to ensure that this facility will not be used to send large files, as it will slow down the overall rate of message transfer in each broker. Will open a separate issue to deal with Defining a maximum payload size.