API Interoperability first draft

elliemdaw commented 4 years ago

As the various apps get started on backend implementations, we need to have a common API defined to ensure interoperability with minimal code redesign. This initial draft reflects discussions that have been going on in slack within the backend and server2server channels.

hdevalence commented 4 years ago

Thanks for getting this written down!

A few questions (which are maybe also questions about the linked CoEpi API?):

If the report interval number is calculated as described, doesn't the interval number already absolutely identify the interval's position in time? What does the date parameter describe?
If we expect intervals to be relatively long, is there a benefit to specifying them in ms rather than seconds?
Does it make a difference to CDNs whether the parameters are passed as query parameters or are encoded in the URL? Like would there be any reason to prefer GET /tcnreport/21600/28932872 over GET /tcnreport/?intervalNumber=28932872&intervalLength=21600? The only thing I can think of is that the former structure might be slightly easier to implement using a pile of static files. Is this important?
I think the current description assumes that the server validates infection reports on behalf of users, and strips the signatures. Is it worth also specifying a version where the server passes the signatures to users to let them verify? This uses an extra 64 bytes per report and also causes more work for the clients.

elliemdaw commented 4 years ago

Thanks for the comments @hdevalence!

I think 1 and 2 might be questions for the CoEpi API, I'm not sure what the exact plan was for intervals.

I think that sending the params in the URL like GET /tcnreport/21600/28932872 is cleaner for sure! If there are no CoEpi objections, I am good with that version. (Unrelated but I think their implementation might come to the TCN GitHub soon?)
I also totally agree with this, I don't think that the API endpoints should enforce one way or another. What part of the description are you concerned about?

Thanks!

hdevalence commented 4 years ago

Re (4), the difference is just that in one case the report has an extra 64-byte signature attached to it, so that the client will have to parse (the 64 bytes) and process (verifying the signatures) the response differently. So either the server could signal in-band that the data is one or the other kind, or else we could have two endpoints, like /tcnreports/ and /signedtcnreports/ and servers can choose to provide one or the other.

ramnanib2 commented 4 years ago

@hdevalence

Date parameter is supplied to maintain backwards-compatibility in case we change the intervalLength starting on a new day. If we change the intervalLength, then without the date, all reports generated based on the previous interval length will not be fetch-able. We can also use dayNumber for this, where dayNumber = (epoch_seconds / 24 60 60), if that's less confusing in terms of time zones. Currently, UTC timezone is assumed. I also think we should have an API to retrieve interval lengths for date ranges.
There's no benefit between ms and seconds. We could go with either option.
The reason that query parameter was chosen was to support defaults. So if no date is supplied, current date is assumed. If no interval is supplied, current interval is default. Cloudfront caching can be implemented based on queryString parameters as well -> https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/QueryStringParameters.html
I don't think the server should strip the signature. Both server and client should be able to verify the report. Server verification is optional, but client verification is imperative.

rettetdemdativ commented 4 years ago

I strongly agree that the server should not strip the signature. It's import for clients to be able to verify that the report was not tampered with. There shouldn't be a need to trust the server in this. If the client wishes to download an additional 64 bytes to be able to verify the validity themselves, they should be able to do it through the API I even think it should be the default behavior and propose that we either have a separate route /tcnsignedreport to get signed reports or a url query param called signed, which can also be set to false.

ramnanib2 commented 4 years ago

Actually, I have another recommendation for defining the batches that allows us to keep the interval length tunable and does not look as awkward as it does now. Each batch can be defined by a combination of dayNumber (seconds since epoch / 24 x 3600) and intervalNumber that corresponds to the interval starting on the day. So for example, if intervalLength is 6 hours, intervalNumber will be in range [0, 3]. So, something like this:

dayNumber = (seconds since epoch / 24 x 3600)
intervalNumber = ((number of seconds since epoch % (24 * 3600)) / interval_length_in_seconds

This will allow us to keep the interval_length flexible starting on a new day and the client does not have to worry about time zones.

How does that sound ?

kiwo commented 4 years ago

would it make sense to just lock the interval down and inform the client of server set interval?

if we let the client choose what he can download, he can choose from about 95 intervals for each day?. it's still cacheable but do we need such flexibility?

if the server can tell the client his "last/current" time and what interval its using, do we even need to consider timezones?

i would prefer a simple apporach which does not need to take timezone etc. into consideration. if one implementation is bugged or localtime is off we might be behind in getting updates.

rettetdemdativ commented 4 years ago

What's the reasoning behind these intervals exactly? Is that to make caching more flexible, or... ? Why can't we just have a set interval and the user can select from one of the "blocks" of reports. Maybe I'm missing something here, but wouldn't that work and simply the implementation?

ramnanib2 commented 4 years ago

@kiwo Essentially we are locking the interval down and not letting client choose a random interval length. However, in future the server might have to change the interval length. Hence we're asking the client to verify the intervalLength it's using for calculating the intervalNumber. If there are too few reports in an interval then we loose the obfuscation of randomization. If there are too many, then download time will increase for a single API call (we could mitigate this by adding pagination in future versions). Right now, we're also using intervals for pagination. In my previous post I recommended using dayNumber which will not depend on timezone.

@calmandniceperson The interval length is essentially fixed, with an option for the server to change it in future while maintaining backwards-compatibility. The client is sending intervalLength only for verification that it used the right one. Client is not choosing the intervalLength at random. It's now allowed to. The server defines it and may change it starting on a new day.

ivnsch commented 4 years ago

(Moving here the Slack discussion) I still don't fully understand why we can't just send interval number and length.

Unless I'm misunderstanding its meaning, the date is derivable from interval number and length. Why can't the api calculate the date and check whether the interval length is valid using this date? It wouldn't be different than if we do this calculation in the client.

Also, the point in time when the api changes the interval length could be defined using Unix time as well, so we don't have to worry about timezones anywhere.

@bhushanRamnani You also wrote:

if the client wants to query reports for a previous day, it needs to send the old intervalLength.

If the server changes the interval, can't it create new buckets for the old data using the new interval (without discarding the old ones, of course, for backwards compatibility)? Otherwise it sounds like the clients need to remember the valid interval lengths for specific time periods and this is complexity we'd prefer to avoid.

Note also that the server can send the current interval length to the client (e.g. as part of a configuration api call that we could do when the app comes to the foreground), minimizing the time periods for which we'd have to support backwards compatibility (from weeks/months to max 1-2 days). We could even remove the need for backwards compatibility entirely, by having the server return the currently valid interval length if the client asks for an outdated/wrong one.

ramnanib2 commented 4 years ago

@i-schuetz

If the server changes the interval, can't it create new buckets for the old data using the new interval (without discarding the old ones, of course, for backwards compatibility)?

This would require resharding all previous report data, replicating to a new datastore and switching the datastore in a way that does not impact availability. It will also mean all the previous reports that were cached in CDN's would need to be cached again.

Otherwise it sounds like the clients need to remember the valid interval lengths for specific time periods and this is complexity we'd prefer to avoid.

Yes that's why I recommended adding an API endpoint to retrieve server length for date ranges.

I understand that sending the date along with the intervalNumber is counter-intuitive without the context for backwards-compatibility, that's why I recommended going with the below approach for representing the interval, that's much more intuitive:

// Represents an integer corresponding to a particular date
dayNumber = (seconds since epoch / 24 x 3600) 

// Represents an integer corresponding to an interval starting on a particular dayNumber
intervalNumber = ((number of seconds since epoch % (24 * 3600)) / interval_length_in_seconds

ivnsch commented 4 years ago

@bhushanRamnani

This would require resharding all previous report data, replicating to a new datastore and switching the datastore in a way that does not impact availability. It will also mean all the previous reports that were cached in CDN's would need to be cached again.

How often do we expect to have to change the interval? If it's seldom enough, it may be worth it?

The client would always fetch the current valid interval length before starting to download reports. If the interval length is changed while it's fetching reports (it gets an invalid interval response), it just starts again. There would be no need for backwards compatibility and the logic would be much simpler.

Edit: Another option which crossed my mind would be that the client doesn't send any interval data, except the timestamp where it wants to start fetching. The server sends back the data + interval length, which the client would use to calculate the next timestamp. Would that make sense? If we assume that the cache is re-created when the interval length is changed, these responses should also be cacheable?

ramnanib2 commented 4 years ago

I think it just comes down to whether re-sharding the CDN and datastore is worth it, every time the initervalLength changes, for the sake of slightly simplifying the query format. My opinion on that is it's not. It's hard to predict right now how often we'll need to do that. It depends on the rate of reports generated at a given location.

Edit: Another option which crossed my mind would be that the client doesn't send any interval data, except the timestamp where it wants to start fetching. The server sends back the data + interval length, which the client would use to calculate the next timestamp. Would that make sense? If we assume that the cache is re-created when the interval length is changed, these responses should also be cacheable?

Timestamp in the query would be too granular for caching to be effective. We could instead provide an API endpoint that provides an intervalLength for a timestamp range, but no way for the server to know whether this information was actually used to accurately derive the intervalNumber. Again, I'd like to avoid re-generating the cache and the data-store if possible.

ivnsch commented 4 years ago

@bhushanRamnani I'm not thinking about the query format, but the additional logic required in the clients to support multiple interval lengths. It's finicky and more error prone than just using always one (variable) length. But I can't evaluate the effort / cost involved in re-generating the cache and have no idea how frequently it will be needed, so I'll rely on your judgement.

If we need to add this logic to the client, the API service to map time periods to interval lengths would be mandatory, as there doesn't seem to be other way to determine them. A client could be offline a long while and the interval lengths could have changed multiple times during this time.

Edit: Right, the last idea with only the timestamp isn't (directly) cache friendly. I had the impression that you had custom logic around it (e.g. to validate that the interval length is valid for the date, which didn't sound like it would be just a cache miss) and thought that assigning the timestamp to an interval could be done as part of it.

ramnanib2 commented 4 years ago

Yes understood, there are two ways to solve this. Client sends the get request with a length it knows about for a dayNumber and interval, if server does not agree that to be correct for the day, responds back with a 401 error code and a config map in the body:

[
    {
        startDayNumber: 0,
        endDayNumber: 18380,
        intervalLengthSeconds: 21600
    },
    {
        startDayNumber: 18381, // absence of endDayNumber means current
        intervalLengthSeconds: 3600
    }
]

The other option is to provide this information in a simple get endpoint GET /interval_config. Client can retrieve this once per day, since the length is guaranteed to remain unchanged for a day.

scottleibrand commented 4 years ago

@bhushanRamnani @i-schuetz and I had a call to settle this and agreed that we shouldn’t need to include the date in v4 backend API calls, just interval length and interval number. we’re hard-coding the interval length in the clients for now, at 6h, and deferring, for now, the work to make that configurable via a server endpoint.

TCNCoalition / TCN

API Interoperability first draft #72