Semantics for POST method

cookeac commented 3 years ago

As part of our work on URL schemes and recommended common implementations, this issue covers using HTTP POST to send new resources from a client to a service. This issue is defined to allow discussion. I've made a start below:

Goals

Provide a standard approach that can be implemented by any server that allows clients to add new resources
Work for any object collection where the server supports addition (for instance, animals, animal sets, events)
Align with best practices for HTTP, REST, and JSON APIs
Allow the posting of single resources and collections of resources
Provides a standardised way of handling errors.

There are several parts to this issue:

How to determine if a collection implemented on the server supports adding resources
The method and URL path to use when adding new resources
Adding single resources or arrays/collections of resources
How to uniquely identify resources being added to the server
Determining status - success or errors
Return results when resources are successfully added
Return results when the server is processing new resources asynchronously
Returning error results.

How to determine if a collection on a server supports adding resources

There are three options:

Read the human-readable documentation or talk to an administrator
Retrieve an icarResourceCollectionReference that provides a reference to the collection, and parse the operations array to see if the "POST" method is supported - this could be through a capabilities or endpoint enumerating API.
Have some other standard capabilities API

The method and URL path to use when adding new resources

The method should be HTTP POST as per RFC-2616
The URL should be the URL which one would GET to get a collection of resources of the specified type. For instance, the URL of the animals collection for icarAnimalCoreResource objects.

Adding single resources or arrays/collections of resources

There might be several options:

Only support posting individual items (likely inefficient for adding bulk data)
Define a special purpose array/collection for posting
Use an icarResourceCollection, setting the memberType and putting resources in the members[] array. The view properties would be unused.

How to uniquely identify resources being added to the server

A discussion area in itself - broken out into issue #153

Determining status - success or errors

The recommended method for REST web services is to use HTTP status codes . For instance,

201 Created
202 Accepted
4xx for client errors (for instance, 409 Conflict when a resource with the same unique ID already exists) Substantial work on this has been done by JSON-API (https://jsonapi.org/format/#crud-creating-responses).

Return results when resources are successfully added

Based on the RFC-2616 specifications and JSON-API, it would seem that when new resources are added successfully, the server should:

Return HTTP status 201 Created
Set the HTTP Location header to the location of the new resource(s)
Return the newly added resources (the same data sent, possibly with IDs filled in and other values generated) in the body.

Return results when the server is processing new resources asynchronously

Based on the RFC-2616 specifications and JSON-API, it would seem that when the server is still processing the submitted, resources (and would otherwise time out), it should:
Return the HTTP status 202 Accepted
Return something in the body that indicates status and how to check status or the result. Some examples are given here: https://restfulapi.net/http-status-202-accepted/ (we will need to decided what we want to do).

Returning error results

The usual recommended approach is to return an HTTP 4xx client error status, and then return an error or problem response in the body.

RFC-7807 defines a form of problem response that can be extended.
JSON-API defines an Errors object We will need to decide the representation we want to use AND whether a server should always fail when an error (rather than a warning) is encountered, or whether it should attempt to process other objects in the request.

cookeac commented 3 years ago

Other suggestions welcome!	Code	HTTP Status	Notes
201	Created	Success and here is the results
202	Accepted	Working on it and here is a link to check status
400	Bad Request	There is a (syntax) error in the JSON data. The response will explain
401	Unauthorized	Credentials are needed (or have expired)
403	Forbidden	This data or collection can't be modified
405	Method not allowed	The POST method is not supported here
409	Conflict	You are trying to create resources but resources with the same ID already exist
413	Request entity too large	You have POSTed more data than the server supports
415	Unsupported media type	JSON data was expected
422	Unprocessable	A semantic (meaning, not syntax) error has occured. The response will explain

Suggested extended problem details type

According to RFC 7807 we should return a Problem Details type with a small number of required fields, and HTTP media type "appplication/problem+json"

Attribute	Required	Type	Description
type	Y	string	URL unique error type (we will need to define these for common error types, implementers can define their own extra ones) e.g. icar.org/json-validation
status	Y	number	HTTP status code e.g. 400
title	Y	string	Human readable short description e.g. "The JSON request failed validation against the ADE 1.2 specification."
details	N	string	More detailed description, optional.
errors[]	N	object	Suggest we have an errors array to provide feedback on multiple problems.

Within the errors object we could have the following:	Attribute	Required	Type
code	N	string	Application/service-specific error code.
errorDetail	Y	string	Detailed error message, e.g. "Required field 'ID' not specified."
collection	N	string	Name of the collection containing the error (if relevant)
instance	N	string	Unique ID of the instance object with the problem (if there is an ID)
pointer	N	string	RFC 6901 JSON Pointer to the error position within the request (where relevant).

Thoughts, suggestions, changes welcome!

ahokkonen commented 3 years ago

We need to agree on do we want to support POST/PUT only per single resource or handle multiple resources in one request.

For single resource actions it seems like more straight-forward and we can use best practices defined on jsonapi/restfulapi etc.

If supporting multiple resources in one request how we handle an overall status if the request partly went well and partly failed?

For example sending POST request for 2 milking visits when one had ended up with 201 and second one with 400 (or other). Should we require than within one request data handling should be transactional - all success/all fail?

alamers commented 3 years ago

From our perspective we always advice to start with single resource updates. This makes the error handling a lot easier, and (for our role as a proxy) checking permissions does not require introspection of a message. Only for use cases where the numbers are very large and the frequency x latency of calls becomes an issue, it requires some form of batch processing. In those cases, we look at the single resource call and see if we can simply group those together and derive a batch call from there. This works but only solves the inefficiency in transport. For further optimisations, it quickly becomes a discussion on how a batch is grouped together: can you do a single permission check for the whole batch / is there a single target for the whole batch?

ahokkonen commented 3 years ago

I also agree with @alamers that starting with single resource handling would be most convention way for both client and services. Biggest issue still is to find a way how to work with resource IDs if other side in not capable to use single key for identifying a resource.

I also like suggestion for http status and error/problem type as defined in Andrew messages.

ahokkonen commented 3 years ago

Also for http codes and problem response message - we might want to use them also for GET calls (invalid query params, no access, not implemented etc.) Would it be a breaking change for v1.1 if we'll define it for GET scheme instead of default "exampleErrorResource" resource array??

ahokkonen commented 3 years ago

Had some discussion with our clients and most of them feel that they would prefer to use POST methods with batches and not with single resources. In our Nordic system client are handling data exchange once or twice a day – not in real-time. For them posting resources as singe events is not an option, especially in a big herds where there is a huge amount of data collected during the day to send.

So I guess from our pov we would need a definition for sending resources in a batches in addition to the single resource POST (also for updating and deleting)

cookeac commented 3 years ago

@alamers wrote:

From our perspective we always advice to start with single resource updates. This makes the error handling a lot easier, and (for our role as a proxy) checking permissions does not require introspection of a message. Only for use cases where the numbers are very large and the frequency x latency of calls becomes an issue, it requires some form of batch processing. In those cases, we look at the single resource call and see if we can simply group those together and derive a batch call from there. This works but only solves the inefficiency in transport. For further optimisations, it quickly becomes a discussion on how a batch is grouped together: can you do a single permission check for the whole batch / is there a single target for the whole batch?

Makes sense, although I note also the comment about @ahokkonen regarding batches. There are more details of the initial discussion on single vs collection in the 25 September 2020 minutes

ahokkonen commented 3 years ago

@cookeac @alamers @erwinspeybroeck

As discussed on Friday meeting:

ICAR ADE by default defines single resource PUT/POST per endpoint collection
- if resources is added (201) should saved resource be returned as a response?
- how server generated reference id should be passed back?
as an extensions for PUT/POST for batches (supports for multiple resources per request) are defined on endpoint collections through sub/nested "/batch" -endpoint (aka /locations/{location-scheme}/{location-id}/milking-visits/batch)
- should response be returned as set/array of individual resource related responses?
- how individual response should be mapped to the resource from request?

ahokkonen commented 3 years ago

From the last technical meeting on 19'th of November some concerns was raised (mainly from FMS side) about having POST events be based on single resource posting instead of the bulk insert (or update). For our case it is becoming kind of urgent task to decide the implementation standard of the services and POST/PUT handlings on API layer.

Other valid point raised by @alamers about my previous proposal on having /batch as an extension for main collection endpoint. In case we want to support single resource update (PUT, PATCH) according RESTful specification for collections (aka /locations/{location-scheme}/{location-id}/milking-visits/{resource-id}) then this extensions is not a best solution.

ahokkonen commented 3 years ago

@cookeac @alamers @erwinspeybroeck @thomasd-gea

on todays meeting we where discussing POST/PUT semantics for bulk/batches. It is still a bit open for us how to handle bulk pushes between 2 disconnected systems.

Let's look on the situation for POST events (pushing new resources to the data service). By design client side needs to push collection of the resources in request to the /batch/{collection} endpoint and for the new events "id" field should be assigned (=null): /locations/batch/drying-offs | request

[
    { "id": null, "eventDateTime": "2020-09-17T10:24:35.997Z", ... },
    { "id": null, "eventDateTime": "2020-09-18T10:24:35.997Z", ... },
    { "id": null, "eventDateTime": "2020-09-19T10:24:35.997Z", ... },
]

How the response model should then be designed? Just a response with created resources is not enough. /locations/batch/drying-offs | response

[
    { "id": "1", "eventDateTime": "2020-09-17T10:24:35.997Z", ... },
    { "id": "2", "eventDateTime": "2020-09-18T10:24:35.997Z", ... },
    { "id": "3", "eventDateTime": "2020-09-19T10:24:35.997Z", ... },
]

In response service should reply with resource processing statuses, new ids and/or possible errors in case of processing failure.

as a client I need to track processing status for each of the resources sent in request - success/failure.
- what field should be used for tracking?
as a client I need to get an "id" of each resource created on data service side (for order operations update/patch/delete)
- should service result only "id" or the whole created resource?
as a client I need to track failure reasons/processing errors for each of the resources.

thomasd-gea commented 3 years ago

Hi, as far as I know, the order of JSON arrays guaranteed, however, depending on the implementation of JSON you use, the order might not be preserved (https://stackoverflow.com/questions/7214293/is-the-order-of-elements-in-a-json-list-preserved). In any case, I think having an explicit response that identifies entries would be great.

BTW, @ahokkonen : was the example with two IDs being "2" intentionally? I assume you meant "3" in the last entry...

Now, the question then is, how the sent entries and the response entries are related to each other. In general I see two ways: identify the order of the sent/returned entries explicitly (e.g. by numbering them) or use source/target identifiers to identify them.

Well, the latter case would be quite easy with our current API and I would expect that every system either has an ID for an entity or could at least create one on the fly (e.g. just assign an increasing number to them) and put that into the ID field of the resource. Whenever such an entry is put into another system via the batch API, the receiving system could just return that ID in the Meta-Data field "sourceId" (https://github.com/adewg/ICAR/blob/Develop/types/icarMetaDataType.json) of the resulting object and put its own ID in the ID field of the resource. That way, the clear relation between entries can be maintained. This would at least be true if the full object is returned.

The "smaller" (i.e. more efficient) alternative would be that the receiving system could just return an object containing the "source" id and either the "assigned" id from the receiving system or an error message. That would make the communication quite easy. I guess, returning just this list of "small" objects VS returning a list with fully specified objects (which is what I'd do for the single POST) would save some bandwidth (and most likely time).

On the question of whether the receiving systems needs to store the sourceId: I guess this could be decided by the receiving system. It is not required, imho. One could just use that for the batch result and then forget about it.

The sending system can then track the new id for the entries that it sent and could also see errors per entry.

The alternative of identifying the order of the entries would be something that we would impose on the JSON array because we are not sure whether user implementations really keep the order of the entries seems somehow strange, so I'd vote for the "sourceId" way as described above.

@ahokkonen would that solve all of your requirements?

thomasd-gea commented 3 years ago

Example:

Input from sending system:

[
    { "id": "1", "eventDateTime": "2020-09-17T10:24:35.997Z", ... },
    { "id": "2", "eventDateTime": "2020-09-18T10:24:35.997Z", ... },
    { "id": "3", "eventDateTime": "2020-09-19T10:24:35.997Z", ... },
...
]

Result returned from receiving system:

[
    { "sourceId": "1", "targetId": "abc-xyz"},
    { "sourceId": "2", "error":{"message":"Animal does not exists", "code":"404" }},
    { "sourceId": "3", "targetId": "def-uvw" },
...
]

BTW: for single entry posts (so non-batch) this object could be re-used. This would make the API quite clear. On the other hand, returning the whole object would still be a valid option.

alamers commented 3 years ago

I think @thomasd-gea summarizes it nicely: (1) either we need to rely on the implicit ordering of the array or (2) on some id from the client for correlating the responses with the input. Relying on the implicit ordering of the array feels very fragile.

Why not make the id compulsory in these batch cases? The client should have some form of id internally, since it is handling a batch of entities here. Exposing that id (in some form) should not be too complicated?

I also would like to remark that, as far as I remember, I think we agreed that (at least as a default) we'd prefer an accept-all-or-reject-all policy. This does not change the discussion much since we'd still need to correlate the responses with the input, but it does give us an alternative (3): for a successful post, we may not need to communicate anything back (completely ignoring the correlation problem). And for a failure, we could send back the full entities that failed, leaving the correlation problem at the client.

For me, only option (2) feels robust. But current systems may not require that much robustness and rely more on manual corrections.

thomasd-gea commented 3 years ago

@alamers Option (3) is not possible, I think. The sending system at least has to get back the ids that are assigned from the target system. If the source wants to reference them somehow, that is. Of course, for "fire and forget" this would be possible, but this is a rather rare case, I would expect. And even there, the mapping does not hurt.

ahokkonen commented 3 years ago

In our solution we allow server systems to implement "partial failure/success" cases - part of the events could be approved and saved (and the proper status should be returned) and part may fail (errors should be returned). So it is very important for us to have a mechanism for response status tracking.

First we where thinking about having some sort of wrapper for batch resources so the client can have their "correlation/tracking" ids for each resource on upper level:

[
    {
        "referenceId": "323424234324",  // or 'clientReferenceId' generated by client side used for tracking
        "data": { // resource is embedded inside 'data' property
            "id": null, // not specified for new resources
            "status": "Pregnant",
            "eventDateTime": "2020-09-17T10:24:35.997Z",
            "animal": {
                "id": "12543488552",
                "scheme": "xxx"
            }
        }
    }
]

and the response would be:

[
    {
        "referenceId": "323424234324", //or 'clientReferenceId' generated by client side used for tracking
    "status": 200,
    "data": { 
        "id": "34", 'id' generated by data service
        "meta": {
            "source": "DEN",
            "modified": "2021-01-01"
        }
    },
    "errors": null
    }
]

So the client app system receives back correlation id, status, possible error and event resource core information - basically only created id and timestamp ( Other simple option would be to just add "correlationId/referenceId" field directly into EventCore resource).

But then we realised that this would be an overengineering for that purpose. I would like to have more generic and standard way of handling this.

Maybe utilize "sourceId" in "meta" property for client tracking? Because in case of sending new event client side is a source (or?). It is a bit open for me how "sourceId" should be used :)

[
    {
        "id": null,
        "status": "Pregnant",
        "eventDateTime": "2020-09-17T10:24:35.997Z",
        "animal": {
            "id": "12543488552",
            "scheme": "xxx"
        },
        "meta": {
            "source": "client identifier",
            "sourceId": "1" // client generated id
        }
    }
]

and then the response:

[
    {
        "id": "12345", // server generated resource id
        "status": "OK",
        "meta": { // meta should be always returned!
            "source": "client identifier",
            "sourceId": "1" // original client generated id 
        },
        "errors": null
    }
]

thomasd-gea commented 3 years ago

@ahokkonen In your 3rd code example, I'd prefer to have the client generated ID in the "id" field, because for the receiving system this is actually the ID in the sending system, isn't it? Example 4 is in general ok for me, but I wouldn'T reuse the meta object as it contains some more info (unless the info like "created", "validTo" are also relevant for the sender). Also, do we need the status field? I would expect it to be only "success" or "failure", so we could either leave it out completely or replace it with a boolean like "success": "true"/"false". When leaving this out, having errors != null would imply that the call was not successful. If we need additional warnings (which I currently don't see) we could add them as well later on.

ahokkonen commented 3 years ago

@thomasd-gea client generated ID is irrelevant for the receiving system, at least in our cases when there are 2 completely disconnected system on client and server sides with their own ids. Only for the client side it is important to know ids for the server resources for update/delete operations. That is why we need to get server generated id back to the caller.

With global unique identifiers (UUIDs) life would be much more easier of course :) But I afraid it is not an option

thomasd-gea commented 3 years ago

Yes, we have a simlar case. Our system doesn't care about external IDs (and does not store them), but every system that wants to communicate with our systems needs to keep track of the changes and, hence, has to know the correlation of their ids with our ids. Otherwise one would have to rely on object equality based on fields (e.g. created on etc.), but at least I wouldn't want to touch that topic in the ICAR API, not even with a veeeeery long pole :)

Having said that, I'd still say we should design the interface as if both systems should be able to know the other systems' ids. Then, each system can decide whether it wants to track the other systems IDs or not.

ahokkonen commented 3 years ago

@ahokkonen In your 3rd code example, I'd prefer to have the client generated ID in the "id" field, because for the receiving system this is actually the ID in the sending system, isn't it? Example 4 is in general ok for me, but I wouldn'T reuse the meta object as it contains some more info (unless the info like "created", "validTo" are also relevant for the sender). Also, do we need the status field? I would expect it to be only "success" or "failure", so we could either leave it out completely or replace it with a boolean like "success": "true"/"false". When leaving this out, having errors != null would imply that the call was not successful. If we need additional warnings (which I currently don't see) we could add them as well later on.

I agree that icarMetaType should not be utilized in full specs in response. About "id" field - if you use client generated id as event identifier on POST what do you expect then on GET? I mean, when fetching resource from data service "id" value for events should then be populated with service IDs, right? Even if you do not care about them :)

ahokkonen commented 3 years ago

@thomasd-gea @cookeac @alamers

we did some additional research on that case and decided to go with following solution to cover our POST/PUT requirements. For POSTing batches client side will use "sourceId" field for defining identifier generated on it side and keep "id" field unassigned:

[
    {
                // empty for new resource
        "id": null,
        "eventDateTime": "2020-09-17",
        "animal": {
            "id": "12543488552",
            "scheme": "scheme-ххх"
        },
        "meta": {
                        // original id generated by client
            "sourceId": "4431696f-8c7c-458b-97d3-b9fa426ce56e",
            "source": "client identifier"
        }
    }
]

Response for such request should be:

[
    {
                // server generated resource id or (IF possible) same value as "sourceId" if UUID support on both sides
        "id": "12345",
        "status": "OK",
                // meta should be always returned, even for failed as used for response tracking
        "meta": {
                        // original id generated by client, used as a correlation id for response tracking
            "sourceId": "4431696f-8c7c-458b-97d3-b9fa426ce56e",
            "source": "client identifier"
                        // time stamp for created resource in data service, to get exact time
            "modified": "2020-09-17",
        },
        "errors": null
    }
]

For PUTing all is the same, only difference is that in request "id" field should be assigned with server identifier value.

Is there any big weaknesses is that solution or any critical conflicts against ADE design principles?

p.s. is there any way for putting comments inside JSON without those ugly red lines? :)

thomasd-gea commented 3 years ago

Well, I'd still prefer to have the id of the source system in the "id" field, as then I just have to pass on that entity to another system without changing it. That would also allow for the receiving system to reuse the id if UUID is enabled on both sides (so it covers @ahokkonen case in the second example and is even better for that use case as the entity is stored "as is"). Whether the source id is then returned in the meta section of the created entity is then in the hands of the managing system.

And for the response, I'd still vote for not including the "status" field (i.e. "errors": null would mean "OK"). Only if that would carry more information than "OK" and "NOT OK" I see the necessity to add it, otherwise this boolean value can just be derived. And I also still think having the complete meta object in the response is not useful as most fields will be useless. You know who the creator is, you know when it was created, etc. In an error case most of the fields will be invalid. Hence, re-using that structure here is not the best idea, IMHO.

And no, JSON does not allow for comments. You have to use explicit fields for that. But I miss this feature also :D

DavidJeppesen-Seges commented 3 years ago

@ahokkonen - I think that is a good suggestion. :)

I have just one question though: Is "source" and "sourceId" the best naming? The dataprovider side would consider their datasource as the source, but clients would probably consider their system the source when events are posted from it.

What other naming could be better? While we already talk about "client" and "dataprovider", I would suggest naming them like this:

source => client or clientId sourceId => clientEventId

DavidJeppesen-Seges commented 3 years ago

@thomasd-gea As I see it, you are suggesting that the "id" field on one single event can contain two different ID's depending on the use case - one from the client and one from the dataprovider. Wouldn't that add unnecessary complexity to the implementation on both sides? I agree that the status field can be derived from the errors array. What would you suggest instead of using the meta object for the response?

ahokkonen commented 3 years ago

Well, I'd still prefer to have the id of the source system in the "id" field, as then I just have to pass on that entity to another system without changing it. That would also allow for the receiving system to reuse the id if UUID is enabled on both sides (so it covers @ahokkonen case in the second example and is even better for that use case as the entity is stored "as is"). Whether the source id is then returned in the meta section of the created entity is then in the hands of the managing system.

And for the response, I'd still vote for not including the "status" field (i.e. "errors": null would mean "OK"). Only if that would carry more information than "OK" and "NOT OK" I see the necessity to add it, otherwise this boolean value can just be derived. And I also still think having the complete meta object in the response is not useful as most fields will be useless. You know who the creator is, you know when it was created, etc. In an error case most of the fields will be invalid. Hence, re-using that structure here is not the best idea, IMHO.

And no, JSON does not allow for comments. You have to use explicit fields for that. But I miss this feature also :D

For "Status" I agree - (errors == null) is good enough indication of success request. With "id" field I still struggling a bit as by default recommendations "id" field should not be defined for new resources... :)

Meta is also something that I would like to have even not in a full content.

thomasd-gea commented 3 years ago

For "Status" I agree - (errors == null) is good enough indication of success request. With "id" field I still struggling a bit as by default recommendations "id" field should not be defined for new resources... :)

Meta is also something that I would like to have even not in a full content.

The "charme" of the ID field for me is that you don't have to change the entity when it originates from another system (which will be the case in 100% if you send it). There, it will have an ID (even if it is only an enumeration on the current batch). In the rare case, where we have globally unique IDs on both sides, the receiving system could even decide to store the given entity exactly as it was given (e.g. if it is a "backup storage" system that only mirrors what another system contains). Hence, the given ID on a POST is just an indicator of what was the ID in the originating system. The receving system can decide to assign a new one, if required, and can also return the original ID in the sourceId field (if that does not already contain a value that the client set AND the system stores that information at all) For a PUT, the id should of course reflect the id of the entity you want to change, so no issues there.

As a result I would see something like we discussed in our call on Tuesday. I don't have it exactly as we discssued, but it looked similar to the following example. Name changes are of course possible:

[{
  // success case
  "requestedId": "abc-xyz",
  "resultId": "12345",
  "errors": null
},{
  // error case
  "requestedId": "abc-de",
  "resultId": null,
  "errors": [...]
}]

requestedId is the id of the object the client POSTed or PUT. resultId is the ID the receiving system uses to reference the stored resource. In a PUT case, both fields would contain the same value (as the ID will not be changed by the receiving system). The client can relate POSTed resources with the result IDs by looking at the requestedId and resultId fields, so it knows under which identifier (possibly even the same) the receiving system stored the entities (or where an error occurred and the entity was not stored).

For completeness, examples for batch POST and PUT (similar to what we discussed): POST (ids reflect the id in the source system and will be returned in the requestedId field):

[
  {"id": "abc-xyz", "eventDateTime": ...},
  {"id": "abc-de", "eventDateTime": ...}
]

PUT (exactly as POST, but IDs have to exist):

[
  {"id": "12345", "eventDateTime": ...},
  {"id": "54321", "eventDateTime": ...}
]

And maybe an interesting side-effect (that I'm happy to have): we could make the id field mandatory for all resources in all cases (maybe except for a single item PATCH, but this is something special anyway). It will always be filled by using the approach sketched above.

Cases where we want an "[store|change]-all-or-nothing" behavior can also be covered (however, that is the case for all examples we had, I think).

PS: for a DELETE batch, the resultId field could be null if the deletion worked, the requestedId would be the id that the client wanted to delete.

thomasd-gea commented 3 years ago

As I see it, you are suggesting that the "id" field on one single event can contain two different ID's depending on the use case - one from the client and one from the dataprovider. Wouldn't that add unnecessary complexity to the implementation on both sides? I agree that the status field can be derived from the errors array. What would you suggest instead of using the meta object for the response?

In addition to my proposal above (which should answer the part of your question on how the response could look like): Well, using POST, PUT, GET and DELETE (and maybe more) are different use cases that you have to handly anyway. If you PUT, DELETE or GET something, you have to be sure that the entity already exists or otherwise you already know that the call will fail. For POST your expectation is just that the item is stored (if it is a shape that is accepted by the receiving system, e.g. all fields required by that system are filled).

In general I think that in most of the cases the ID field of an entity will not be null. At least for all the systems that exchange data that is stored in their system, that would be true (if they provide an interface for data exchange; could be ICAR). So, taking that entity and just posting it to another interface (in a batch or otherwise) would be very easy as the sending system wouldn't have to change anything.

Of course, if using PUT or GET you would have to know the entity ID, but after you post something these should be known (based on the response I sketeched above).

Otherwise, you would have to change EVERY entity that you want to POST: remove the ID and set the ID somewhere else. Not a real biggy, but still additional work (that would be superfluous if the systems share a common ID system). At least for PUT you would have to overwrite the ID anyway if the ID systems differ. But you would have to do the mapping only here. That means: a little less logic on a POST, the same for the other cases.

ahokkonen commented 3 years ago

For us @thomasd-gea proposed solution sounds good. If all agree that this is the way how we should work with POST batches - we'll use it.

tpekeler commented 3 years ago

Hello,

I like the concept of sourceID mentioned in: https://github.com/adewg/ICAR/wiki/Design-Principles#dealing-with-ids This makes some exchange of data between different (more than 2) partners possible.

Therefore I think the client should fill out the fields in meta and we would recommend the storing of these to the receiving partner, if they plan to later pass this event to a third party.

In additon I see no problem to fill the id in an POST, if it's used as UUID, but this needs to be confirmed by both exchanging parties during development or in their data exchange contract.

Like stated in all of the last discussions: as long as we don't have UUIDs everywhere, we need to map IDs somehow, but this needs to be implemented only once.

cookeac commented 3 years ago

So to summarise our discussion at the meeting today:

When POSTing resources (single or batch):

the client MUST fill in the meta.source and meta.sourceId with the client and client identifier (so it can track responses)
the server MUST retain the meta.source and meta.sourceId and always return this in the response
the client SHOULD ONLY fill in the Id field with its client identifier if this is a UUID
the server MAY issue and return its own identifier in the Id field (OR it may choose to use the supplied UUID if there is one)

the server MUST return the object(s) posted with:

meta.source and meta.sourceId populated
any essential fields (Id and meta.modified )
any fields changed by the server the server MAY return all the fields of the object(s) if preferred.

erwinspeybroeck commented 3 years ago

Extract from the minutes from the meeting from 3/6/2021:

Anton states they need partial processing capability for the batch POST. This is not available in the PR from Andrew (single POST, transactional batch). Next to that there are doubts if also Collections (paging) is needed in a batch POST. The proposed response is not really future proof (if we also want partial batch POST). An option is to define all fields in the response as optional, so i is not needed to return the complete resource. Errors have to be returned in some way.

Conclusion : Anton and Thomas will prepare a proposal with some examples for next meeting. So we can better understand en view if we can find some common standard for partial and trasactional batch POST.

cookeac commented 3 years ago

I spent a lot of time looking at how other schemes handled "partial acceptance". The general feedback seemed to be "this is not RESTful, don't do it that way" (ie, it doesn't perfectly fit the "resource" model and becomes more like a remote procedure call).

However, if we wanted to handle this, then instead we should:

Extend the RFC7807 "Problem Details" object to be a multipart response that deals with success (returning the issued ID), warnings, and errors (as per Anton's earlier suggestion, but perhaps still extending it from a standard response rather than a complete reinvention.
Always return the extended problem details object, even if successful. This doesn't line up with the normal semantics of POST 200/201 results, but would be a way to handle this.

For ourpart, we would be likely to move to using HTTP2 and single object POST methods to retain compliance with REST and HTTP (while still achieving performance) rather than use this approach, but it is definitely feasible.

erwinspeybroeck commented 3 years ago

adewg / ICAR