cta-wave / common-media-client-data

A repository to collect discussion and feedback on the Common Media Client Data proposal.
30 stars 0 forks source link

post data size (Comment from the Streaming Video Alliance) #51

Closed NEOAdvancedTechnology closed 4 years ago

NEOAdvancedTechnology commented 4 years ago

Below is a comment on draft CTA-5004 Common Media Client Data from the Streaming Video Alliance:

The post data might get a bit heavy (might) as it is all textual and could have a lot of key values to upload – appreciate not all is mandatory and there is a relevance rule here (i.e. don’t post Br on Manifest request only on Chunk etc.) Might be worth looking at a bit range approach or shorten the key/value terms, remember LGI might also add to this header things like tokens etc. (i.e. this is LONG for a post br=3200,bs=3,cid=”ABCD-1234”,d=4004,did=”Android6.0-player-build-12.3”,dl=18000,mtp=48175,nor=”..%2F300kbps%2Fsegment35.m4v”,nrr=12323-48763,ot=v,pr=1.08,rtp=12000,sf=d,sid=”6e2fb550-c457-11e9-bb97-0800200c9a66”,st=v,v=1)

NEOAdvancedTechnology commented 4 years ago

Compression test data

Input: "br=3200,bs=3,cid=”ABCD-1234”,d=4004,did=”Android6.0-player-build1 2.3”,dl=18000,mtp=48175,nor=”..%2F300kbps%2Fsegment35.m4v”,nrr=12323-48763,ot=v,pr=1.08,rtp=12000,sf=d,sid=”6e2fb550-c457-11e9-bb97-0800200c9a66”,st=v,v=1"

Deflate: Compression ratio: 126 % Original size: 236 bytes Result size: 188 bytes eJwtjjsOwjAQRK9CQ7e21t8kxRZ8xD1wbFBEfrJDJDouwuU4CTbQjWZn3o6LpCQiuEQK2s7T+/na7Q9HJqTSWYMnjajB/0+jj1PnLUc29+dHiMzdu96LjeTqm+5J1Jh5wzKTrkVlYJxiaXK+lSeFeHNzyiqF6xDGRRk+6LU0xxgp/5SK6bqyCqaFVpizx7GGmGlCFm66kIf0G2ODvDhjkLXaVEyI0DDnmophXpDDbXO2tqBTQa0kPv+0S9Y=

bzip2: Compression ratio: 103 % Original size: 236 bytes Result size: 230 bytes QlpoOTFBWSZTWRZq/3sAAEWd4UIHf+I9AD+v3yBAAAACEAAwALbIiYk0TTJsU2gYk00ekekEU2jQjRoGgAAABqm0htU9E9CbIjJtA0mTCS3U/6wx4RR3IVxIlCpU/KwiW6NzUvGSc4NE0YsQBOEUuSgIMgusyURBowlZPckhuxcb0Mh5tRXi1smNlXs3YrIawdYzL4MoRNtG+YJShobed4g62u58nejDAtapGKBDiFWG0yW/6cMprKF3ubHiFke56hiWautJst6Uqo4NCVNNyvgVnDoAdABPGzzA7F3JFOFCQFmr/ew=

gz: Compression ratio: 118 % Original size: 236 bytes Result size: 200 bytes H4sIAAAAAAAA/y2OOw7CMBBEr0JDt7bW3yTFFnzEPXBsUER+skMkOi7C5TgJNtCNZmfejoukJCK4RAraztP7+drtD0cmpNJZgyeNqMH/T6OPU+ctRzb350eIzN273ouN5Oqb7knUmHnDMpOuRWVgnGJpcr6VJ4V4c3PKKoXrEMZFGT7otTTHGCn/lIrpurIKpoVWmLPHsYaYaUIWbrqQh/QbY4O8OGOQtdpUTIjQMOeaimFekMNtc7a2oFNBrSQ+5mLoYOwAAAA=

As per: http://www.txtwizard.net/compression

wilaw commented 4 years ago

Upload and download not symmetric. This spec loads the uplink. Rough estimate of cost: 2s segments, demuxed, HLS. 4 requests very 2s, or 120 requests per minute. At 350B per verbose request, = 42kB per minute, or ~5.6kbps.

Deflate/gzip does. not benefit much, deflate gives 26% reduction.

The header implementation with H2/H3 will take advantage of HPACK/QPACK header compression, which would reduce bits over wire.

We could encode to binary format. Player developers prefer simplicity of sending data. Other analytics beaconing is considerably larger than what is being proposed in this specification. CDNs would prefer to avoid CPU cost on parsing data.

What would be optimal solutions?

  1. We could use a dictionary approach so key values would not need to be transmitted.
  2. Query args cannot take binary payloads, so would still need to be base64 encoded.

Only benefit to a compression scheme would be uplink bandwidth constrained players.

Is the lack of a compression scheme a blocker to adoption for v1?

NEOAdvancedTechnology commented 4 years ago

There are a lot of performance benefits with HTTP/2 and one of the most exciting ones is definitely HPACK compression. Unlike HTTP/1.1, headers can now be compressed using an algorithm known as Huffman encoding which in turn reduces the amount of data being sent. In our tests we showed it resulted in a decrease of header sizes by an average of 30%.

From: https://www.keycdn.com/blog/http2-hpack-compression

pankaj-giter commented 4 years ago

A quick test...I had 274 bytes of CMCD in JSON to produce 206 bytes of CBOR. Doing any text encoding would result in losing the benefits of CBOR...so overall not much gain to go thru CBOR and text encoding. CBOR tool: http://cbor.me/ CMCD in JSON: { "br": 3200, "bs": 3, "cid": "ABCD-1234", "d": 4004, "did": "Android6.0-player-build-12.3", "dl": 18000, "mtp": 48175, "nor": "..%2F300kbps%2Fsegment35.m4v", "nrr": "12323-48763", "ot": "v", "pr": 1.08, "rtp": 12000, "sf": "d", "sid": "6e2fb550-c457-11e9-bb97-0800200c9a66", "st": "v", "v": 1 }

alficles commented 4 years ago

If we really wanted to try to optimize, we'd probably use enum keys for the fields and enum values. We'd recommend encoding session ids in binary. That would make the input look more like this:

{
0: 3200,
1: 3,
2: "ABCD-1234",
3: 4004,
4: 18000,
5: 48175,
6: "..%2F300kbps%2Fsegment35.m4v",
7: "12323-48763",
8: 1,
9: 1.08,
10: 12000,
11: 1,
12: "0123456789abcdef",
13: 1,
14: 1
}

That encodes to 114 bytes that must be inflated to 152, which is down from the input size of 185:

br=3200,bs=3,cid=”ABCD-1234”,d=4004,dl=18000,mtp=48175,nor=”..%2F300kbps%2Fsegment35.m4v”,nrr=12323-48763,ot=v,pr=1.08,rtp=12000,sf=d,sid=”6e2fb550-c457-11e9-bb97-0800200c9a66”,st=v,v=1

The difference would be even larger if the nor and/or cid fields were not included in this particular content.

wilaw commented 4 years ago

We discussed at length. Benefits are modest. Impact on support on CDN would be high. We identified Concise Binary Object Representation (CBOR) https://tools.ietf.org/html/rfc7049 as a good candidate if compression is to be used. Decision to deploy v1 without additional compression defined. If we receive push-back on adoption due to payload weight, we will consider adding in a compression scheme in v2.