Refactor API for cleaner RESTful idioms and support pagination and batching

caguado commented 9 months ago

This PR contains 4 stacked commits to modify the current API proposal to:

Model locations and sessions as REST resources: Removes the action from the URI for the resource to allow individual referencing and collections of nested resources.
Model authorization errors when accessing a resource: The current API namespace is shared across all the callers so responses will be filtered by visibility of the caller or denied in the absence of authorization to use the passed input parameters.
Add pagination to listing of locations and sessions: Implement a paginator over the list of sessions visible to a caller. The paginator is opportunistic as it will return an upper bounded set using the client hint, and opaque as it will make no commitments on the content of the pagination token.
Disambiguate items in batch processing for sessions: The creation of sessions allows to submit a set of them to treat them as an aggregate when requesting to operate on a certain peering posture. This commit adds a field to the input and output series such that the client can univocally identify which sessions have been processed successfully or not without relying on implicit ordering of the input or result set.

github-actions[bot] commented 9 months ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

caguado commented 9 months ago

I have read the CLA Document and I hereby sign the CLA

caguado commented 9 months ago

I originally thought of opening 4 smaller PRs with one topic branch per change to comment/reject independently, but I noticed it complicates the parallelization of this merge, hence only one PR. I hope this is the expectation. If any comments require changes, please let me know if you want them added as newer commits, or prefer amending the offending ones.

grizz commented 9 months ago

@caguado looks good, thanks! I have a couple questions for clarification.

I'm confused on the difference between the item_id and session_id? Could you explain?

session_id (of individual session generated by the server. Client provides a item ID to track the session within the request, but the server's session ID will be the final definitive ID)

I just found that, if it's just the server vs initiator, I think we should name it accordingly so it's more intuitive, thoughts?

How is next_token supposed to work? I see:

    * next_token (opaque string to hint the query and last result returned when fetching a new page)

but don't see anything else referencing that.

jramseyer commented 9 months ago

I'm confused on the difference between the item_id and session_id? Could you explain?

session_id (of individual session generated by the server. Client provides a item ID to track the session within the request, but the server's session ID will be the final definitive ID)

I just found that, if it's just the server vs initiator, I think we should name it accordingly so it's more intuitive, thoughts?

What if the server always issued both ids (for individual sessions and for batch configuration requests)? It might be simpler if the server were always the grantor of IDs, and the client the recipient of IDs (which they can map back internally as needed).

If the client grants the individual item_ids, how do we handle the case where the server proposes to add additional sessions within a request? Should the server ID those sessions? Should they be ID-less, until the client agrees to configure them?

caguado commented 9 months ago

I'm confused on the difference between the item_id and session_id? Could you explain?

session_id (of individual session generated by the server. Client provides a item ID to track the session within the request, but the server's session ID will be the final definitive ID)

I just found that, if it's just the server vs initiator, I think we should name it accordingly so it's more intuitive, thoughts?

What if the server always issued both ids (for individual sessions and for batch configuration requests)? It might be simpler if the server were always the grantor of IDs, and the client the recipient of IDs (which they can map back internally as needed).

If the client grants the individual item_ids, how do we handle the case where the server proposes to add additional sessions within a request? Should the server ID those sessions? Should they be ID-less, until the client agrees to configure them?

I understood we had discarded that behaviour in the call 2 from Jan 10th where if the server wants to propose sessions outside of those requested would either fall the request or we would add another API resource for it.

caguado commented 8 months ago

Regarding comments around IDs from the last call. I'm adding a description of the intent for the existing IDs, and the role of the additional ones. In every case, it is assumed that IDs are used to univocally identify the resources they refer to in their scope of existence.

Existing IDs in the current spec proposed:

session_id: reflects the server-side generated identifier of the BGP session resource it represents on the receiver system (resource that is owned by the initiator, read ownership from an authz sense, ie. in the OAuth2 sense). It is invariant for the lifetime of the session and unique within the relationship between initiator and receiver. Should the underlying resource cease to exist, the identifier remains unique in the relationship with the initiator. This constraint is to guarantee stability of usage on resource identifiers over time, should any of the parts assign additional semantics on either side (such as usage permissions, ACLs, etc). It is up to the receiver's implementation to do full/soft deletions, and to create a namespace where IDs include owner's ID or are globally unique across time. So in that sense, this identifier is also opaque to the client.
item_id: reflects a single element within a set of candidate resources, to be used during batched creation requests. This identifier is unique within the batch and for the lifecycle of the batch request. The identifier allows initiators to univocally identify items from responses, should any of them have not resulted in a new resource on the receiver side.

Discussed additional identifiers:

session_name (or similar): reflects the client provided identifier for the new resource to be created on the receiver. As it is under the initiator's control, it is not invariant and only unique for the lifetime of the session, not during the relationship between initiator and receiver. This name carries meaning for the initiator, similar to a description field. It may have a different format to the _sessionid. Given that the client considers as an additional identity of the session resource, it could also be used to replace the item_id specified above under the request scope.
idempotency_token (or similar nonce client-generated request ID): reflects the identifier of the network call to the receiver's API. This identifier is optional and generated by the client to help the receiver disambiguate duplicate requests in the presence of network partitions, reactions to timeouts, etc. and ensure the API remains idempotent for all mutating uses. Assuming batch create is already implemented with idempotency (e.g. via indexing the tuple (initiator IP, receiver IP, location), it will help disambiguate calls that are not idempotent in nature such as resource updates, unless versioning/optimistic locking is also added to the API spec.
request_id: reflects the server-side identifier provided by the receiver to identify the network call and help during troubleshooting, audit logging.

So to concrete my proposal, what if we:

keep session_id as is
add session_name to univocally identify resources from the client perspective and during the session creation (replacing item_id)
add idempotency_token (or any other name) as optional attribute to help on disambiguation of retries

jramseyer commented 8 months ago

Thanks for the updates!

A few questions:

It is invariant for the lifetime of the session and unique within the relationship between initiator and receiver. Should the underlying resource cease to exist, the identifier remains unique in the relationship with the initiator.

Let's say ClientA and ServerB want to peer. ClientA sends a request to peer on client_ip with server_ip. Server configures the request, and assigns it id_1. Later, both sides delete the session. A year later, ClientA re-requests peering on the same ips. Is it correct then that the server will return a new id, id_2, for this case? I suppose my question is: what is the lifetime of the session? Does it end when the session is deleted, even if we later re-establish?

I would assume it ends when the session is deleted, and the server should give a new id, but wanted to confirm.

item_id

This is different from the session_id, in that, if (later, when the API supports it) the client wants to change the prefix limit on session (session_id_1), that request would be assigned item_id_1?

session_name

I don't think we need to define this--this identifier is exclusive to the client, so the client can choose to use it or not. I would vote we leave it out and let clients do as they wish. I don't see this getting passed back and forth between the client and server.

(I guess the use case would be that the client could send a request for a session with the session_name attached, and the server would reply with session_id, and thus the client could easily match between session_name and session_ids?)

idempotency_token

Is this necessary? Should we not just force the server to check for duplicate requests on their end?
I suppose it makes it easier for the server to check for duplicate requests, but I worry that we set ourselves up for further confusion here, if the client is badly configured example: client sends request with token A client has error and thinks that requestA was not sent client sends second request with tokenB Server gets requests A and B, and naively filters on tokens instead of request contents. Thus, server configures requests A and B, because the tokens were different. I understand that the above is an edge case, but I don't see what we get out of adding an additional token, since we should be uniquing by request contents anyway?

request_id

Makes sense to me.

Thanks for the detailed proposal!

grizz commented 8 months ago

Thanks for all of the added details Carlos!

session_id

It sounds like we all agree on this and it that it MUST be globally unique for currently enabled sessions. :)

I think having it unique over time should be up to the server, but I can see the argument for making it required if you all feel strongly about it.

The rest mostly seem unnecessary to me.

session_name

I don't think we need to define this--this identifier is exclusive to the client, so the client can choose to use it or not. I would vote we leave it out and let clients do as they wish. I don't see this getting passed back and forth between the client and server.

(I guess the use case would be that the client could send a request for a session with the session_name attached, and the server would reply with session_id, and thus the client could easily match between session_name and session_ids?)

I agree, at least in it the current implementation. If we were to add a request for get_id_from_session_name, that could make sense -- but I can't see a use case for that.

A unique way to identify the session is always going to be local/remote IPs (and local/remote ASNs if a network has multiple ones) as there can be only one, so I don't think we need idempotency_token. It's also how we're going to be able to generate session_ids for existing sessions.

item_id

I'm struggling to see the usefullness of this, we could do array offset of the request if we need to, otherwise we can still always fall back to local/remote IP to figure these out.

request_id

Again, it could be handy for troubleshooting, but users could just as easily to IP or ASN and timestamp, right?

I think we should keep the initial API spec as simple as possible, it's a lot easier to add fields later than to remove them.

caguado commented 8 months ago

I would assume it ends when the session is deleted, and the server should give a new id, but wanted to confirm.

That's the proposal, to make explicit not to recycle resource IDs or otherwise explain why.

This is different from the session_id, in that, if (later, when the API supports it) the client wants to change the prefix limit on session (session_id_1), that request would be assigned item_id_1?

I'm struggling to see the usefullness of this, we could do array offset of the request if we need to, otherwise we can still always fall back to local/remote IP to figure these out.

I am proposing to make request and response item matching explicit for the duration of the request alone. If the agreement is not to use a unique identifier that only exists for such duration, then how a developer should match items the response to the items they requested? Joining request and response lists by offset? Matching IP-pairs for the session in the response body? I don't agree those are simpler than matching by the client-side-assigned identity to the resource (session_name, or this item_id if we don't agree to add such session_name). But if everyone agrees, we should make that expectation explicit in the doc.

I would vote we leave it out and let clients do as they wish. I don't see this getting passed back and forth between the client and server.

I agree, at least in it the current implementation. If we were to add a request for get_id_from_session_name, that could make sense -- but I can't see a use case for that.

The main use case is ease of use of the API for a human interacting with it. Imagine presenting a list of sessions to act on in a drop down menu in the UI, what would that drop down expect the user to know about the sessions? their random ID? the IPv6 pair concatenated string for the session? Surely, it is up to each consumer of the API to decide, but this field is aiming to provide them with an option that both initiator-receiver can independently agree and explicit.

A unique way to identify the session is always going to be local/remote IPs (and local/remote ASNs if a network has multiple ones) as there can be only one, so I don't think we need idempotency_token.

My first comment on session_name also mentions that using it would make idempotency_token redundant. But I disagree that sessions are univocally identified by a 2-way or 4-way tuple, let's make it explicit how we choose one format, or the other, or both? IMO it is actually simpler for a client to identify the session with its ID :) Somewhat related, this other attempt at identifying auto-configuration is deliberately open to such definition so remains ambiguous to follow one or another but implementers will have to disambiguate off spec.

I think we should keep the initial API spec as simple as possible, it's a lot easier to add fields later than to remove them.

I argue that not defining behavior explicitly makes it not simple as people will have to agree off spec. It will be easy so long those changes are backward compatible. We are not defining how versioning of the API will work either (content negotiation? embedded in resource path (i.e. /v1/sessions)?) so we implicitly expect that new fields to be added will not be mandatory.

bgp / autopeer

Refactor API for cleaner RESTful idioms and support pagination and batching #20