Closed paullatzelsperger closed 2 years ago
Good start. One quick comment:
extensions/api/data-management/asset
In my mind doesn't model an asset but an "asset/data address pair". We should come up with a different name for it. You could probably make an argument that "Asset" without the data address is one representation of this "pair" but I think it is a separate thing. For example, the data address of an asset can change without that semantically resulting in a new asset.
We've called that an AssetEntry
in previous implementations. That's also what the DataLoader
utilizes at the moment. WDYT?
We've called that an
AssetEntry
in previous implementations. That's also what theDataLoader
utilizes at the moment. WDYT?
Would that mean we have two REST resources?
Would that mean we have two REST resources?
- AssetEntry (asset + data address pair which is modifiable)
- Asset (asset object which could only be cloned)
I would even say there are three different resources:
AssetEntry
(or whatever we wanna call it) - can only be createdAsset
- can be read, updated (cloned) and decommissioned. The latter will delete the associated DataAddress
DataAddress
- can be read and modified to your heart's content as it does not impact the Asset
directly.To 1: π I'm fully with you!
To 2:
How would the copy/clone endpoint look like? Would it be like a "normal REST endpoint" ([PUT/PATCH] /asset
) which creates a new object?
To 3:
To 4: Two things:
data-management
lib/module. The first though of me was that BOM would mean "bill of material", which confused me a little bit πlets not get ahead of ourselves here. This is merely the issue to work out the API spec. Of course that will result in multiple PRs. All I want to achieve here is to have consent, about the requirements, limitations and scope of the data mgmt api.
copy/clone: dont know yet, we should solve this as generic as possible. PUT
and PATCH
both seem appropriate verbs.
HATEOAS: in my personal view, HATEOAS won't solve any problems. It only makes things more complicated and convoluted.
"BOM": you're right. It refers to the fact that there should be a single parent build file that "gives you" all the API submodules, and exposes them. So what you said is correct :) Take a look at extensions/catalog
for an example.
From a workflow perspective it is much easier to create actual java code and generate the YAML spec out of them, than the other way round.
Dummy endpoints are fine, as the entire EDC has not yet reached a stable state and it will give people a chance to implement/generate their client stubs.
Thanks for your suggestion, although I would have appreciated it if you could have added it to the original Data Management API issue, to not have two duplicated issues about the same topic. Original Issue: https://github.com/eclipse-dataspaceconnector/DataSpaceConnector/issues/476
You mention concepts like version counters, back pointers to ancestors, lineage of domain objects, an AssetSchema, read only properties, a new Policy Template, a new Policy Store and so on. I see here a lot of added complexity in your proposal and don't understand why all this is necessary.
I can see clearly where you are coming from. The manipulating of some these objects might have legal consequences. An important issue to address.
First, I assume there is a little of a misunderstanding in this proposal regarding the ContractAgreement. At the time of writing the ContractAgreement does contain the Asset objects itself (to ensure immutability). This protects the ContractAgreement
from unintended side effects. For example when an Asset
, ContractDefinition
or Policy
is updated. I like this solution because it's simple and easily enforceable by preventing updates on
the ContractAgreement
. What are the benefits of your proposed solution (in comparison to the existing solution)?
Second, I would like to address a new level of complexity that is introduced here. I assume now you suggest replacing the assets of the contract agreement with a pointer, pointing to that asset? And then you would do the same for all the other objects, that contain the Assets
, too. For example the ContractOffer
or the TransferProcess
. I assume for the objects of the new PolicyStore
you would probably do something similar, so that the ContractOffer
and -Agreement
only reference a policy id.
In conclusion this could mean, that the storage that contains the Assets
, might be another storage that contains the assets for the ContractAgreement
. And there could exist a third storage for the TransferProcess
(and so on). This might make correctly executed XA-Transactions mandatory for some operations. A feature that is probably not supported by all storage systems (e.g. I could not find something about XA-Transactions for Azure CosmosDb).
As you can see I see a lot of risks and complexity here. Maybe you can elaborate why or where such complexity might be necessary in the future. From my point of view it looks like the most important object we need to be extra careful about is the ContractAgreement
. But at the same time, this is also the only object the user should never be updated (except maybe the DataAddress). What did I miss?
Additionally, some objects might be updated pretty regularly, like the TransferProcess
or the ContractNegotation
. States could move here back and forth. Where is the risk here in comparison to "just update the object"?
Thanks in advance.
Dominik Pinsel dominik.pinsel@daimler.com, Daimler TSS GmbH, legal info/Impressum
You mention concepts like version counters, back pointers to ancestors, lineage of domain objects, an AssetSchema, read only properties, a new Policy Template, a new Policy Store and so on. I see here a lot of added complexity in your proposal and don't understand why all this is necessary.
There are no risks, it is necessary because there are requirements for it. Whether or not it's complex is a moot point.
First, I assume there is a little of a misunderstanding in this proposal regarding the ContractAgreement. At the time of writing the ContractAgreement does contain the Asset objects itself (to ensure immutability).
No misunderstanding. Immutability can be ensured through other avenues than embedding it. From a data modelling perspective it makes no sense to embed (and thus duplicate) the Asset
in the ContractAgreement
. We'd simply run into consistency problems (modifying the original asset, backreferencing from ContractAgreement to Asset, etc.).
This protects the ContractAgreement from unintended side effects. For example when an Asset, ContractDefinition or Policy is updated. I like this solution because it's simple and easily enforceable by preventing updates on the ContractAgreement. What are the benefits of your proposed solution (in comparison to the existing solution)?
First of all, there are no updates on ContractAgreements
. FWIW, there are no updates (in the strict sense) at all. As stated before, every "update" is actually a clone, so the original Asset
that belongs to a ContractAgreement
remains immutable. The benefits of this are consistent programming models, clear and traceable object relations and lineage and a clean data model.
Second, I would like to address a new level of complexity that is introduced here. I assume now you suggest replacing the assets of the contract agreement with a pointer, pointing to that asset? And then you would do the same for all the other objects, that contain the Assets, too. For example the ContractOffer or the TransferProcess.
I'm not talking about pointers. There is a difference between domain objects having references and database entities having foreign keys. No, I would not do the same. ContractOffer
s are protocol messages rather than domain objects, and TransferProcess
don't contain assets.
I assume for the objects of the new PolicyStore you would probably do something similar, so that the ContractOffer and -Agreement only reference a policy id.
From a client perspective need to get a list of available policies (or rather: policy definitions) and use them in e.g. a create ContractDefinition
operation. As of now we have no way of reusing policy definitions. This is a problem and must be addressed with something akin to a PolicyDefinitionStore
. There is the notion of "instantiating" a policy, as opposed to "referencing" it.
In conclusion this could mean, that the storage that contains the Assets, might be another storage that contains the assets for the ContractAgreement. And there could exist a third storage for the TransferProcess (and so on). This might make correctly executed XA-Transactions mandatory for some operations. A feature that is probably not supported by all storage systems (e.g. I could not find something about XA-Transactions for Azure CosmosDb).
No. All Assets are in one place. That's the beauty of it. The ContractAgreement in one store would simply reference the Asset in the AssetIndex. Every domain object potentially has its own storage, this design decision has been made long ago. As I have explained before, there really is no way around distributed transactions, even if some systems don't support them fully.
As you can see I see a lot of risks and complexity here. Maybe you can elaborate why or where such complexity might be necessary in the future. From my point of view it looks like the most important object we need to be extra careful about is the ContractAgreement. But at the same time, this is also the only object the user should never be updated (except maybe the DataAddress). What did I miss?
I see no no risks, just complexity. That's due to the fact that we're dealing with a highly distributed and extensible system. Even if we follow your original proposal, we'd end up with the same problems, just at a different location, which - if not addressed - will eventually produce a non-functional system.
Additionally, some objects might be updated pretty regularly, like the TransferProcess or the ContractNegotation. States could move here back and forth. Where is the risk here in comparison to "just update the object"?
There is a huge difference between objects being modified through the API and modified internally, such as TPs or CNs. In the latter case copy/clone is not necessary.
All those points you raised have been discussed, it'll all turn out plausible once my proposal for an API is finished.
I would be great to have the whole spec from the attached md included in the first comment and keep it updated based on the comments - similar to how Jim is doing it in #463 We can move it to appropriate places in the repo (docs, javadoc, glossary, ...) as soon as we start to work on it. Moritz Keppler moritz.keppler@daimler.com, Daimler TSS GmbH, legal info/Impressum
that'll be a wall of text incoming :) i'll update the original issue descr as there are already comments.
every modification triggers the creation of a new domain object.
hm, so in case a DataAddress changes a client needs to call update DataAddress to create a clone with the updated information update Asset to create a clone of the Asset pointing to the new Address update ContractDefinition to create a clone of all ContractDefinitions referencing the old copy of the Asset delete ContractDefinition to remove all now deprecated ContractDefintions?
guess I'm just confused, thx for some enlightenment!
Deployment Scenario 2: As a standalone runtime ("API server")
just a standard api gateway placed in another security zone? Or would it include more logic?
In some situations there may be requirements to have mutable/immutable
wouldn't it be sufficient to exclude immutable fields from the DTOs?
Hey @MoritzKeppler , i'll try and answer all in-inline
hm, so in case a DataAddress changes a client needs to call update DataAddress to create a clone with the updated information update Asset to create a clone ...
The DataAddress
can be modified directly without copy/cloning it, as it is an internal concept and existing contracts won't be affected. I should have explicitly stated that, will update accordingly.
just a standard api gateway placed in another security zone? Or would it include more logic?
Exactly. There would be a separate runtime/process, potentially running in another process and/or on a different server. This would require additional logic in so far as that runtime would need a communication channel with the EDC (e.g. rest, websockets). That "API runtime" should not directly modify persisted objects to avoid consistency problems.
wouldn't it be sufficient to exclude immutable fields from the DTOs?
Yes, for the API that would be sufficient. Having that feature directly "on the Asset" however, would enable this on a wider scope. I'm fine with starting with the DTO approach and put the rest off for later.
thx! So I am in favor of ignoring the api gw / security zone topic for now and solely concentrate on api definition. Also fine for me to start with the "DTO approach" to protect the immutable fields.
@MoritzKeppler updated the description.
closed as all subtasks are complete
This Issue deals with creating a specification of the Data Management API based on the design considerations and recommendations stated in the test listed below
I didn't want to paste directly here, to avoid walls of text.Core aspects
1. Endpoints vs Use cases
Following the REST principles, every endpoint should modify the state of an object rather than trigger behaviour. For example, to initiate a data transfer the endpoints should be
POST http://your-connector.com/api/v1/transfer/transfer-request
rather thanPOST http://your-connector.com/api/v1/transfer/start-transfer
Consequently, uses cases (as outlined below) might not have a 1-to-1 translation into API endpoints, they might even involve several consecutive API calls.2.
delete
is only possible sometimesobjects may only be deleted if they are not involved in ongoing contracts, etc. Updating is (initially) done through "delete-and-recreate"
3. support pagination
every
GET
endpoint should support pagination, sorting and filtering4. Desired outcome
We should have a (or several) YAML files containing the specification of each of the endpoints. In addition, we should have
Controller
classes that implement that spec, even if most of them are just stubs returning dummy data for the immediate future. So, by referencing thedata-management
-BOM, a controller implicitly exposes the entire API. Note: we will NOT consider the standalone deployment scenario, especially the communication channel, in this issue5. Module structure
Every domain object should be reflected in the module
a BOM should exist for the entire API.
Full proposal
Cross-cutting aspects
Terminology
Asset
,DataAddress
,ContractDefinition
etc. May contain potentially internal information.ContractAgreementEntity
would not contain the entireAsset
, but merely maintain a foreign key to anAssetEntity
. Entities are never to be passed outside the persistence layer.Object Lifecycle
Most domain objects go through a lifecycle where transitions might or might not be possible at a given time. Or, certain transitions may trigger subsequent actions, which could potentially be long-running or even involve human interaction. For example, decommissioning an
Asset
that is involved in aContractAgreement
might cause the contract to expire, or not be possible at all. A similar situation exists forContractDefinition
s.We'll assume that domain objects have some sort of state associated with them, regardless whether that is implemented using an actual
state
field, or a computed state, or anything else.We propose the following definition of states domain objects can make use of:
INITIAL
: object is not included in any communication with other dataspace participants. This state is optional, API clients may choose to "publish immediately".PUBLISHED
: object is (potentially) visible to other participants.DEPRECATED
: object is only available to existing relations, e.g. an Asset is not included in new contract offers anymore.DECOMMISSIONED
: object is not in use anymore (i.e. not included in any active contract agreement or offer)DELETED
: object is not visible to clients anymore, just exists for traceability reasons.Decommissioning: as a starting point let's go with option a) as it is simpler to implement. If necessary, we can change this later.
Authentication
This solely deals with access to the API and is completely separate from the
IdentityService
and connector-to-connector authentication. There should be a thinauthentication-spi
extension that contains anAuthenticationService
, which could then be backed by an implementation extension LDAP, AAD, Apache Shiro, Keycloak, OAuth2/OIDC, etc. In a first PoC it could be sufficient to have an API-Key-based authentication implementation.Updating and deleting items
Modifying certain domain objects could potentially have complicated and legally relevant consequences, therefor there is no
UPDATE
API for those domain objects. There will beDELETE
endpoints, which return an error when the item in question cannot be deleted, e.g. when anAsset
is already involved in aContractAgreement
.Consequently, whether or not an item can be "updated" (i.e. deleted and recreated) depends on its state, which could be an explicit field on the domain object, or it could be a computed property, e.g. by checking whether an
Asset
is involved in aContractAgreement
orContractNegotiation
.That said, creating new items is always possible, but they will have a different identity.
API layer
In order to decouple the API from the internal domain objects and entities there should be a dedicated API layer, that contains:
Asset
,ContractAgreement
, etc. They are not database entities!Asset
,DataAddress
,ContractDefinition
, etc.) and back.AssetService
, that has methodsuploadAsset()
,publishAsset()
, andrescindAsset()
. Such a service would then maintain references to theAssetIndex
, theContractDefinitionStore
, etc and be responsible for transactional consistency.Read
endpoints must support filtering, sorting and pagination. Even though cursor-based pagination is widely used nowadays, it is mostly used for real-time applications (things like Facebook) where data is expected to change constantly. Here, we'll deal with mostly static data, so we can reap the benefits of offset pagination as it is easier to implement. APagination
object could like this:This will greatly improve usability and performance in UI views such as tables and lists, and will reduce stress on the involved systems.
On a related note, the entities that we have now, e.g.
Asset
,ContractDefinition
will most certainly differ from the actual database entities. For example, aContractDefinition
would not contain theAsset
as nested object, but would maintain a foreign key to it. Thus, the way domain objects are modeled may/will differ from how database entities are modeled.Assets + DataAddress
Create
Asset
and theDataAddress
, both stored transactionally. TheDataAddress
may initially be an empty skeleton.DataLoader
can be used to do that.Read
GET
endpoints forAsset
andDataAddress
should exist.AssetIndex
with aSELECT_ALL
queryCriterion
and add paginationUpdate
DataAddress
is done through a separate endpoint. Copy/clone is not needed,DataAddress
es can just be modified as they are an internal concept and do not modify the characteristics of anAsset
.Assets
can not be updated.Delete
ContractNegotation
orContractAgreement
. The request will fail otherwise.Asset
is still in this manner "unpublished", deleting it is straight-forward.Asset
without looking left or right.Danger Zone
As a convenience feature we could implement a danger zone, where deleting assets is possible but no validation is performed, which could put the application's data backend in an invalid or inconsistent state.
Policies + PolicyTemplates
Policies are pre-loaded as so-called
PolicyTemplates
and are stored in aPolicyTemplateStore
. Upon instantiation of a policy e.g. in theContractDefinition
, the template is copied into theContractDefinition
with a new policy ID.The reason we propose to actually copy the policy, as opposed to reference it, is that copying them would clutter the
Read
endpoint with a potentially high number of almost identical policies. Assuming that this endpoint will be used to select policies for the creation ofContractDefinition
, it is simply a matter of usability. Furthermore, aContractDefinition
is legally binding and should remain in place, even if the backing policy changes at some point.In order to bring this feature back in line with other domain endpoints, there could be an additional
PolicyStore
, where all instantiated policies are stored, so as not to persist them embedded in aContractDefinitions
.Create
no special considerations. Copies a
PolicyTemplate
into thePolicyStore
effectively "instantiating" it.Update
PolicyTemplates
can simply be updated usingcopy/clone
semantics, without special considerations. InstantiatedPolicy
objects are immutable.Delete
PolicyTemplates
can simply be decommissioned (using the lifecycle process), without special considerations.ContractDefinitions
Create
contractPolicy
andaccessPolicy
(cf. previous section)PolicyTemplateStore
and upon creating theContractDefinition
the policy is simply copied.PolicyId
values, so they are technically different policies, there is no way to backreference thePolicyTemplate
.Read
GET
endpoint should accept an optional paging objectContractDefinitionStore
, the interface needs to be amended to accept those parametersUpdate
no update, just "create-new"
Delete
Deleting a
ContractNegotation
is only possible as long as there is noContractAgreement
based off of it. There is no way of knowing whether aContractDefinition
is involved in an ongoingContractNegotation
, so deleting it will cause the CN to fail. ``Lifecycle aspects Decommissioning: The
ContractDefinitionStore
cannot return decommissioned or deprecated definitions.ContractNegotiations
Read
no special considerations, filtering and pagination must be implemented.
Update
not possible through the API.
Delete
not possible throught the API.
Cancel
There could be a
CANCEL
endpoint which would effectively move it to a (new)CANCELLING/CANCELLED
state.Decline
see
Cancel
, would move it to theDECLINING/DECLINED
state.ContractOffers
Create
/Read
/Update
/Delete
ContractOffers
are not domain objects in the conventional sense, they are generated on the fly for every participant, so we cannot directly "touch" those. They are protocol details rather than domain objects. Rescinding an offer would have to be done by updating theContractDefinition
. To browse other participants' offerings, see use case "Browse Catalog"TransferProcesses
Create
/Update
/Delete
not possible
Read
no special considerations, filtering and paging must be implemented.
Cancel
Since there is no
delete
operation, theCANCEL
endpoint would effectively send aTransferProcess
to aCANCELLED
state in the next tick-over. The CommandQueue must be used for this.Note: there will be other APIs pertaining to TransferProcesses where they can be updated. Necessary endpoints would be:
Use Cases
IN_PROGRESS
Linked Issues - Preliminary work:
Linked Issues - endpoints
This Issue replaces/supersedes #476 [EDIT 1] kicked "copy/clone" [EDIT 2] corrected some whitespaces [EDIT 3] update description and changed "TL;DR" -> "Core aspects", added sub-issues