CrunchyBagel / TracePrivately

A privacy-focused app using Apple's soon-to-be-released contact tracing framework.
MIT License
351 stars 27 forks source link

Why base64? #48

Closed dstotijn closed 4 years ago

dstotijn commented 4 years ago

I'm wondering why you chose base64 encoding for downloading keys, e.g. in your OpenAPI spec and in the client code.

When downloading a large set of diagnosis keys, using base64 is not size efficient. Because the length of a DiagnosisKey is 20 bytes (16 bytes for the key, 4 bytes for the ENIntervalNumber), and 20 is not a multiple of 3, the encoded output needs padding added so it's a multiple of 4 (ref). The length of a base64 string for one DiagnosisKey would be 28 bytes (example).

Example of how this affects the size of data transfer:

Encoding 28 MB as a Base64 string would takeceil(28.000.000 / 3) * 4 = 37.333.336 bytes = 37,3 MB to download. That's an increase of 33%. Given a broad user base, this can impact both CDNs, government proxies and individual users with small mobile data plans.

I would suggest changing the content type of the download endpoint to application/octet-stream; e.g. a binarystream. I'm working on a reference server for Apple/Google's framework, which uses this strategy: https://github.com/dstotijn/ct-diag-server

Note: For uploading, it's a non-issue, because the content length will always be small; at most 14 days worth of data per upload, e.g. 20 14 = 280 bytes. Using Base64 encoding would make it: `ceil(20 14 / 3) * 4 = 376 bytes` (slightly more than 33%). But of course this size increase would hardly make a difference.

HendX commented 4 years ago

Thanks. I completely agree with all of this. Initially I wanted to get something up and running quickly without having to deal with binary data in PHP.

I’ll try and add this option today. I’ll leave the existing one there and let the client choose using headers.

HendX commented 4 years ago

I've made a reference to your implementation here:

https://github.com/CrunchyBagel/TracePrivately/blob/master/KeyServer/README.md

I intend to add support for your API also, which I think helps both of us.

HendX commented 4 years ago

@dstotijn Do you have documentation on how to setup your server? I don't know Docker and not really sure where to start.

Or alternatively if you have an instance running I could use to test my implementation?

HendX commented 4 years ago

Initial binary implementation here:

https://github.com/CrunchyBagel/TracePrivately/tree/feat/msgpack

This format allows the associated data to easily be returned also.

dstotijn commented 4 years ago

@HendX Great! I'll have a look tonight (CEST).

About running ct-diag-server: It should be as simple as installing Docker, then running docker-compose up in a terminal from the root directory of the repository. But I'll fire up an instance of the server on a VPS tonight as well.

dstotijn commented 4 years ago

Or alternatively if you have an instance running I could use to test my implementation?

You can use this dummy server: https://ct-diag-server.v0x.nl If you install Go, you can run this example client code to list keys in the database:

go run github.com/dstotijn/ct-diag-server/examples/client -baseURL https://ct-diag-server.v0x.nl

To show usage for the example client (e.g. for options for uploading):

go run github.com/dstotijn/ct-diag-server/examples/client -h
HendX commented 4 years ago

Thanks @dstotijn - getting around to trying this an ensuring compatibility! The code has now been updating to support binary data so this will make it easier.

HendX commented 4 years ago

I've now merged PR #53 into head, meaning the app now supports an extensible format for dealing with large numbers of keys in binary format.

I'm going to close this issue now as it address the primary issue.