Closed faustbrian closed 5 years ago
Let's do step 1 - serialization. first and complete this task, also to get this into 2.3
Step 2 - we can use codec from snapshots module to reduce this even more. But only after 1 is completed. MsgPack would be usefull here. But this is for later, ignore compression for now
Did you consider processing overhead on these options? Sending more data to reduce the amount of processing needed on both sides is the better choice considering the kind of VMs most people run on.
Actually, the JSON is already minimized when being sent (whitespace removed). So we have:
original: 9474 bytes
gzip: 3324 (65% reduction)
xz: 3080 (67% reduction)
serialized to base64: 4242 (55% reduction)
serialized to hex: 6340 (33% reduction)
serialized to hex, gzip: 2603 (73% reduction)
serialized to hex, xz: 2464 (74% reduction)
@geopsllc no, see the last sentence of my previous comment
After studying this a little bit more I realized that the data is already being compressed. A snippet from tcpdump(1)
:
Request:
--------
GET /peer/blocks?lastBlockHeight=1819552 HTTP/1.1
user-agent: got/9.6.0 (https://github.com/sindresorhus/got)
version: 2.2.1
port: 4002
nethash: 2a44f340d76ffc3df204c5f38cd355b7496c9065a1ade2ef92071436bd72e867
content-type: application/json
hashid: 264bd69b
accept-encoding: gzip, deflate <--------------- [1]
Host: 35.221.179.218:4002
Connection: close
Response:
---------
HTTP/1.1 200 OK
X-RateLimit-UserLimit: 20
X-RateLimit-UserRemaining: 19
X-RateLimit-UserReset: 1552916765228
nethash: 2a44f340d76ffc3df204c5f38cd355b7496c9065a1ade2ef92071436bd72e867
milestonehash: 3b7ee0793e2ddf23
version: 2.1.0
port: 4002
os: linux
height: 1819559
hashid: 90f40149
content-type: application/json; charset=utf-8
cache-control: no-cache
vary: accept-encoding
content-encoding: gzip <--------------- [2]
Date: Mon, 18 Mar 2019 13:46:04 GMT
Connection: close
Transfer-Encoding: chunked
... binary data in gzip format ...
This is due to meaningful defaults in the libraries we use: [1] https://github.com/sindresorhus/got#decompress [2] https://hapijs.com/api#-serveroptionscompression
This means that we are already here:
gzip: 3324 (65% reduction)
Given the above I think it does not make sense to pursue further reduction by serialization, given the little benefit it is going to bring, the extra cost on CPU it is going to induce (in both serialization and deserialization) and complexity in the code.
@faustbrian @supaiku0 kill this issue?
We already have the transactions serialized in memory (tx.serialized.toString("hex")
)
so there would be no overhead in that regards (except for the deserialization on the receiver's end which is a bit slower)
But yeah I think the numbers speak for itself and there is no real benefit at the moment of pursuing it further.
@vasild there are other room of optimisation the aip11 is inducing:
I am considering various ways to minimize the traffic, serialization being one of them. Lets take the following example reply of
/peer/blocks
:That is 13504 bytes. If we just remove the whitespace it becomes 9474 (30% reduction).
If serialized, then it would look like this:
and would be 6440 (52% reduction).
In summary: original: 13504 whitespace removed: 9474 (30% reduction) serialized: 6440 (52% reduction) serialized, whitespace removed: 6340 (53% reduction) whitespace removed, gzip: 3324 (75% reduction) whitespace removed, xz: 3080 (77% reduction) serialized, whitespace removed, gzip: 2603 (81% reduction) serialized, whitespace removed, xz: 2464 (82% reduction)
The problem with serialized is that each byte is represented as 2 characters and occupies 2 bytes in the output which is 2x bloat - something to avoid if one is concerned about space. If we are to avoid that and introduce a custom binary format, which will not be valid JSON, for example
{ "foo": "... binary data here..." }
we would shrink the size to:serialized with binary data, whitespace removed: 3196 (76% reduction) serialized with binary data, whitespace removed, gzip: 2315 (83% reduction) serialized with binary data, whitespace removed, xz: 2308 (83% reduction)
base64 would be somewhere in between and would keep it as valid JSON.
Now we find a balance between:
Notice that 1. is pay once, 2. and 3. are permanent costs.
CPU load to serialize/deserialize and compress/decompress is to be assessed too.