Closed taojing2002 closed 7 months ago
The proposed REST call (GET /updateDatacite?option=...
) doesn't comply with RESTful design principles:
Instead of including the verb ("update") in the name updateDatacite
, the verb "update" should really be represented by the HTTP method used.
With this in mind, I think GET
is not the correct method. A GET
should never cause any change of state on the object - it should simply be an idempotent read
operation.
It seems like POST
, PUT
or PATCH
might be more appropriate. Here are some descriptions copied from this site:
HTTP POST
-- Used to create a new resource into the collection of resources. POST is neither safe nor idempotent, and invoking two identical POST requests will result in two different resources containing the same information (except resource ids)
HTTP PUT
-- used primarily to update an existing resource (if the resource does not exist, then API may decide to create a new resource or not)
HTTP PATCH
-- used to make a partial update on a resource (i.e. for partially updating an existing resource; whereas PUT is used for replacing a resource in its entirety.)
From these descriptions it sounds like PATCH might fit best, but my understanding is that it's not widely supported by browsers etc. In that case, maybe PUT
?
In summary - I don't think GET is right, but I don't think there's a clear "right answer" from the other options - so am totally open to discussion.
Once that's decided, the REST call might look like this (Assuming we use PUT
- just for an example)
PUT /datacite/batchUpdate?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]
Is there any reason we can't accept a mix of pids and formatIds in the same REST call? Could we allow the caller to mix them, instead of having to define ?option=
?
I agree with much of what Matthew said. Let's discuss this before you implement.
In addition, REST is all about managing resources, rather than modeling remote procedure calls (RPC). So each API should be against a specific resource or collection of resources. Typically that is the objects
resource collection in DataONE, but in this case, the resource of interest is the identifiers collection. In addition, DataCite is not the only identifier minting authority that we might use for registering identifier metadata -- we also use EZID and OSTI, and those may be extended to non-DOI types like ARKs in the future. So, a method that works on an identifiers
collection and does a PUT to a specific identifier, and then based on that identifier knows which service to contact would be best. So, how about something like:
For a single identifier, the traditional REST structure would look like:
PUT /identifiers/{pid}
For the collection-wide operation, it might be more like:
PUT /identifiers?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]
Note there is no method like "batchUpdate" in a REST style API -- instead, we are modifying either a single resource or a whole collection. Let's discuss.
In our today's backend developers' meeting, we revised the rest API:
PUT /identifiers/doi/?[all=true] | [&pid={pid1}] | [&formatId={formatId1}]
//examples
PUT /identifiers/doi/?all=true //update everything
PUT /identifiers/doi/?pid={pid1}&pid={pid2} //update pid1 and pid2
PUT /identifiers/doi/?formatId={formatId1}&formatId={formatId2} //update dois with formatId1 or formatId2
// process all elements in a mixed list:
PUT /identifiers/doi/?formatId={formatId1}&pid={pid2}
// formatIds and pids ignored, since `all=true` updates everything anyway
PUT /identifiers/doi/?all=true&formatId={formatId}&pid={pid2}
PUT /identifiers/doi/?all=false&pid={pid2} //update pid2
Methods: updateIdMetadata(Session session, Enum type, String[] pids, String[] formatIds) Parameters: type a type of the identifier, e.g. DOI pid a repeatable list of identifiers of objects to be updated formatId a repeatable list of identifiers of object format. Any objects of the type with those format ids will be updated. Returns Boolean TRUE if the updating request is scheduled Return type: Types.Boolean Raises:
updateAllIdMetadata(Session session, Enum type) Parameters: type a type of the identifier, e.g. DOI Returns Boolean TRUE if the updating request is scheduled Return type: Types.Boolean Raises:
LGTM!
Overall looks good, but I think you can omit the DOI part of the rest url. The service should be able to tell what kind of identifier it is.
@mbjones @artntek If we don't have DOI part of the rest url, how we interpret those two requests:
PUT /identifiers/?all=true
PUT /identifiers/doi/?formatId=formatId1
Does the first one mean we update metadata for all identifiers of all supported types (e.g. doi, ark)? Does the second one mean we update metadata for all identifiers of all supported types which have the formatId1 format?
Earlier, I proposed:
For a single identifier, the service will update that one identifier's metadata based on its type (e.g., DOIs go one place and arks go another):
PUT /identifiers/{pid}
For the collection-wide operation, we can update multiple at once, and for each PID, the right location to update is determined by the identifier type:
PUT /identifiers?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]
So, under that scenario, your two calls are below in the proposed REST syntax:
PUT /identifiers/?option=all
PUT /identifiers/?option=formatId&formatId=formatId1
Let's discuss on slack or in our meeting and summarize here when we reach a conclusion.
Rest Calls:
PUT /identifiers/{pid1}
PUT /identifiers[/]?[all=true] | [&pid={pid1}] | [&formatId={formatId1}]
PUT /identifiers/{pid1} //update pid1
PUT /identifiers/?all=true //update everything
PUT /identifiers/?pid={pid1}&pid={pid2} //update pid1 and pid2
PUT /identifiers/?formatId={formatId1}&formatId={formatId2} //update dois with formatId1 or formatId2
PUT /identifiers?formatId={formatId1}&pid={pid2} //process all elements in a mixed list
PUT /identifiers?all=true&formatId={formatId}&pid={pid2} //formatIds and pids ignored, since `all=true` updates everything anyway
PUT /identifiers?all=false&pid={pid2} //update pid2
Two Methods: updateIdMetadata(Session session, String[] pids, String[] formatIds) Parameters: pid a repeatable list of identifiers of objects to be updated formatId a repeatable list of identifiers of object format. Any objects of the type with those format ids will be updated. Returns Boolean TRUE if the updating request is scheduled Return type: Types.Boolean Raises:
updateAllIdMetadata(Session session) Parameters: Returns Boolean TRUE if the updating request is scheduled Return type: Types.Boolean Raises:
need to add this to admin guide documentation too, since it's not part of D1 API docs
We are removing the Metacat
replication
servlet, which has the methods to update the Datacite metadata documents for DOIs. So we need a new method in MNAdmin API to allow operator to do the same tasks.Here is the proposed method:
updateDatacite(session, option, pid[], formatId[])
GET /updateDatacite?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]
This method triggers an updating task to ensure the all or partial (specified by either a list of pids or format ids) DOI objects' metadata is up-to-date. If the caller specify option
all
, bothpid
andformatId
parameters will be ignored. If the caller specify optionpids
, the formatId parameter will be ignored. For many implementations, this method would only queue an asynchronous updating task, and not wait for the task to complete.Parameters
all
,pids
andformatIds
all
orformatIds
.all
orpids
.