Closed edgardmarx closed 2 years ago
Currently, there is a mismatch between the proposed IDS information model based on RDF and the implemented one.
The IDS Information Model defines how data assets should be described and how IDS participants (e.g. connector, broker) communicate with each other to share this information in a way both can understand - technology-independent. However, neither the RAM, nor the IDS-G declare using the RDF ontology for the technical implementation of all interfaces as mandatory. Why should they? How the technical realizations of non-IDS interfaces and data models of the components look like is a design decision - technology-dependent and maybe use case specific. And we, right from version 1.0.0, transparently communicated in all presentations, documentations, and support requests that the DSC is using a custom data model, that even has evolved over the last months (see here) - stored in a SQL database, and not in an RDF triple store. We, as the developers, agreed on that due to our observations and experiences in the deployment and interaction with our IDS connector reference implementation. The IDS Information Model can be hardly understood by developers and people without any IDS or RDF knowledge. This is why we decided to encapsulate the IDS overhead for the specified IDS component communication to provide an easy to understand REST API for dedicated systems that do not implement IDS. Nevertheless, we stick to IDS specifications when communicating within the IDS ecosystem.
Why use RDF? Because it is the standard language for doing so such as HTML is for writing Web-pages.
Talking about standards, we are referring to the HTTP/1.1 specification defining how REST APIs should look like. See here. Using HATEOS, our REST API is following common OpenAPI specifications.
You can devise your own language for writing Webpages, but there is no propose if nobody will use it.
I don't see that people are not using this connector because of the implemented REST API as this follows common standards. On top of that, JSON is a well-known and widely used format for exchanging information. It is just our abstraction of the IDS RDF ontology, as it is e.g. also the JSON-LD created by the Infomodel library. The three topics database/data model, non-IDS interfaces, and IDS interfaces should not mix up. The IDS Infomodel, as described above, defines the latter. Everything else is a design decision. So you could of course declare our implementation as "wrong" - but there is no right or wrong.
First and foremost, the IDS Connector in many places uses UUIDs instead of URLs to identify resources.
This is not true. Each REST resource has a self-link that is the unique identifier of an object.
"_links": {
"self": {
"href": "https://localhost:8080/api/artifacts/ca502fbc-fbeb-4125-bd65-97536647d623"
},
When creating an object via REST API, this is stored inside the database. While doing so, a uuid is created, that is the only one used for this object and at the same time part of the object's id. Using UUIDs as identifiers in large databases is best practice. https://localhost:8080/api/offers/ca502fbc-fbeb-4125-bd65-97536647d623
is the id that is also part of the IDS object that is created when interacting with other IDS components:
"ids:Artifact" : {
"@id" : "https://localhost:8080/api/artifacts/ca502fbc-fbeb-4125-bd65-97536647d623"
}
Thus, when an object is retrieved via IDS, the extracted @id
is the exact link to the object. Performing a GET
request on that returns the object - if you have access to the REST API. We do not map various IDs - the internal one is the external one. And, for an artifact, the id is not the access url to the data asset, it never was. See below for more details.
That, first of all. Coming to your issue:
That the bootstrapping feature has its bugs and pitfalls is another problem. Since our database does not allow to predefine and set IDs of objects, the IDS objects are converted into DSC objects and when they are saved, a separate UUID is created, which in turn is part of the unique ID to the outside. Therefore, a mapping would have to be stored here because the UUID of the bootstrapping file is not the final one. But it is stored as bootstrappingId
, so you could filter for that attribute if needed. On top of that, as the DSC creates IDs containing UUIDs, it does not allow arbitrary IDs without any UUID. That's how our data model respectively the REST API works.
In addition, the accessUrl
of an artifact is defined in another file (as described here). This should not and cannot be set as the @id
as we intentionally hide the original data source (remote data). As you may have noticed in the past, this URL is never part of any IDS object. Again, a design decision.
Thanks for your very detailed answer. Do you think that it will help if the bootstrap references not to the access URL which might contain an autogenerated UUID or an inherited system design decision but to the user's alias (sameAS) URI?
i.e.
artifact.sameAs.https\://ids-dev.corpinter.net/mds/devportal/artifacts/electricVehicleStatus
instead of
artifact.accessUrl.https\://w3id.org/idsa/autogen/artifact/d5b1cd4e-2a5a-47c2-86c5-003c6a11ce69
also interface i.e.
https://ids.mycompany.com/api/artifacts?id=https://ids-dev.corpinter.net/mds/devportal/artifacts/electricVehicleStatus
instead of
https://ids.mycompany.com/api/artifacts/d5b1cd4e-2a5a-47c2-86c5-003c6a11ce69
If you want to provide data from https://ids-dev.corpinter.net/mds/devportal/artifacts/electricVehicleStatus
within a resource resp. an artifact, you have to provide a catalog.jsonld
containing the metadata in ids format (for an example see here). The @id
of the artifact can stay an autogen one, as it will be replaced by a real self-link as soon as the object has been stored in the database. This is because the DSC does not allow to set IDs as it otherwise could not ensure the uniqueness of an ID anymore.
So e.g. you set:
"@type": "ids:Artifact",
"@id": "https://w3id.org/idsa/autogen/artifact/5c96b6f0-a698-4329-9f15-4913bf4e86f5",
The artifact object will then be available at e.g. https://localhost:8080/api/artifacts/{generated-id}
with a GET request and contain an attribute called bootstrapId
referring to https://w3id.org/idsa/autogen/artifact/5c96b6f0-a698-4329-9f15-4913bf4e86f5
. This way, you do not need to store the mapping somewhere else. Using the relations to other objects, you can navigate your way to the representation, resource, and catalog.
When creating an artifact via the REST endpoint, you need to set an accessUrl
to point to the data source. As this is not part of the Infomodel (as reasoned above), we cannot include this URL in the catalog.jsonld
. Instead, you can set it in the bootstrap.properties
. E.g.:
artifact.accessUrl.https://w3id.org/idsa/autogen/artifact/5c96b6f0-a698-4329-9f15-4913bf4e86f5
=https://ids-dev.corpinter.net/mds/devportal/artifacts/electricVehicleStatus
In addition: Pay attention to defining the right bootstrapping path in the application.properties
.
## Starting path for bootstrapping
bootstrap.path=./src/resources
bootstrap.enabled=false
This was changed in v6.4.0 as part of a fix. Thus, the example bootstrapping files are not loaded when enabling this feature.
I agree, we could use the sameAs
attribute and improve the bootstrapping, but it should work as it is right now - if used properly.
Describe the bug Currently, there is a mismatch between the proposed IDS information model based on RDF and the implemented one. This mismatch generates great confusion as well as numerous bugs and errors. The original IDS information model utilizes the Resource Description Framework to enable IDS participating users to describe their shared and consumed resources.
RDF is based on the usage of URIs and in the case of Semantic Web URLs. RDF comes as a natural standard to be used in the IDS data model, allowing users to describe their shared services and databases so other users (machines or not) can better consume it. Why use RDF? Because it is the standard language for doing so such as HTML is for writing Web-pages. You can devise your own language for writing Webpages, but there is no propose if nobody will use it. That said, there is a mismatch between the RDF principles and the interfaces implemented at the IDS Connector. First and foremost, the IDS Connector in many places uses UUIDs instead of URLs to identify resources. The main problem here is that URLs are the Web way of identifying resources and UUID are not part of it. See the mismatch between the two interfaces below, one for registering Artifacts, another for consuming it:
The both interfaces above clarifies the issue with the current data model implemented by the IDS Connector. While it the IDS data model uses RDF and URL's to define resources in one hand side, the IDS Connector implementation limits its usage by using UUIDs in the other hand side. There is an undistinguishable difference between the UUIDs used by the IDS Connector data model to identify resources locally and the URLs used by the IDS data model to identify resources globally. For instance, Mercedes-Benz connector publish Electric Vehicle Status data on the following intuitive URL: https://ids-dev.corpinter.net/mds/devportal/offers/electricVehicleStatus using Connector version 6.5.1. However, the latest version forces the user to assign an arbitrary UUID to this URL (see error logs). Another problem is that if happens that the URL has two UUIDs by chance, the connector can use the wrong one for identification. Last but not the least, a further argument to abandon the use of UUIDs in the APIs is that, different from URLs, they are not unique (https://towardsdatascience.com/are-uuids-really-unique-57eb80fc2a87).
In this reported issue, I strongly suggest using UUIDs only in case of automatically generated URLs, but abandon completely its use for identifying resources, adopting RDF on its plenitude, using URLs for identifying and retrieving resources (artifacts, catalogs, etc). Further, the APIs should receive the full resource identifier URLs/URIs and not a substring generated by an arbitrary method.
To Reproduce
Expected behavior Use URLs internally as well as in the APIs, always passing the resource URL or URI and not a manual/automatic generated substring.
Screenshots & Logs
Error bootstraping an Artifact without UUID
Check line 74 in UUIDUtils: at io.dataspaceconnector.common.util.UUIDUtils.uuidFromUri(UUIDUtils.java:74)
Trying to register Artifact to a Broker (UUID created manually in bootstrap)