DCAT lock-in and exit strategy

paulakeen commented 4 years ago

Time to revamp?

(Consider this a very first comment amongst a thread of comments that will follow and will complete the idea being expressed in this issue).

The time has come to face that ADMS is locked-in by DCAT. In our opinion this is a very inconvenient situation for many projects and hinders the semantic cross-domain interoperability (justifications and references to support this statement will be provided later on).

Let me please apologise first of all for not knowing well the history of the development of DCAT and ADMS, and their respective APs. Eager though to learn from the ones who would be willing to share that history.

For some years now many of my clients and we ourselves keep asking questions like these:

Why does ADMS define the adms namespace and then it is nowhere used in the implementation?
Why isn't there a class named "Asset" in the adms namespace?
How is it justified that a dcat:Dataset is an Asset, shouldn't it be the opposite, that a dataset is a type of asset?
What is the justification for having a catalogue as part of the vocabulary? Shouldn't the catalogue be external to it? e.g. if DCAT catalogued Assets, being Datasets one type of Assets, then DCAT would be the right vocabulary for cataloguing what is defined in ADMS (the Assets, and above the asset the Resource, as part of ADMS, not of DCAT).
Would not the Catalogue itself be an Asset?
If the inspiration for Asset was the IFLA FRBR, why to come apart, and far away, from the original IFLA concepts and terminology to end up with an over-conflated version of concepts and terms (e.g. why to avoid Expression and Manifestation and disperse its properties (even duplicating the semantics) in Resource, Dataset and Distribution?
Why not taking into account the IFLA LRM latest model to map Agent and its properties linking to the Asset?

Again apologies for not knowing the history of the ADMS development, but I understand that its development came after DCAT and that it was proposed as a profile of DCAT instead of a totally different vocabulary with its own concepts, definitions and purposes.

My proposal is that we consider to reapproach the development of a totally new Asset Ontology, with its own namespace, entities, naming and design rules (aligned to the ones by SEMIC for the Core Vocabularies). This new ontology should be totally de-coupled from DCAT and DCAT-AP.

This would break totally the backwards compatibility with the current implementations of ADMS. Therefore there is also the need of evolution/transition plans which should include the maximum assurance possible of the semantic and technical compatibility with the previous versions. Which is not impossible but needs a lot of thought and work.

In our opinion there is the need for reflecting on the cost and benefits of exiting the dependency from DCAT and if so decided plan the transition and assume the switching-cost.

paulakeen commented 4 years ago

What is an Asset?

Let us first propose a definition for "resource", which in our opinion should be defined in the Asset Ontology (or may be even in another more generic ontology, but not in DCAT).

Resource: "a res available for use". (See the definition of res in the IFLA LRM specification).

Now, we can propose the following definition for Asset:

Asset: "a resource, probably resulting from a work, with purpose and value".

Notice that these definitions are purposefully simple and broad. Our opinion is that a recurrent exercise of generalisation/subsumption/generalisation of concepts is needed to come up with a comprehensible ontology of Asset.

This discussion should lead to challenge whether the narrowing of the generic concept of Asset should cast subclasses of Assets like Human Resource (can a person be considered an asset in the HR domain?), Digital Asset (should a future evolution of DCAT should broaden the scope to deal with digital assets, thus converting the DCAT into DAC "Digital Asset Catalogue"?).

The essence of an asset seems to be that it has value and can be used for specific purposes thus possibly creating added value for other purposes. Sharing and reuse seems to be core in the nature of an asset: a proof that a resource is valuable is that it is coveted by others and therefore requested for exchange, an activity which is at the basis of the human civilisation; as in commerce and knowledge transfer. Therefore everything that contributes to facilitate the transaction and acquire the asset contributes also to delimit its value. If the executor and context of the work from which the asset emanates, its very expression, the manifestations of its material or virtual features, and the existence of different itemised occurrences contribute also to perceive (or estimate) the resource as valuable, then all these aspects need to be captured by the Asset Ontology.

We want to emphasize that our proposal is not to evolve ADMS, but to deprecate the current ADMS and replace it with an Asset Ontology and possibly a Digital Asset Ontology, we'll see. Differently to ADMS the ontology should be used to describe the aspects that make of a resource an asset (e.g., purpose and value, and other to be analysed, depending of how we define Asset). Similarly to ADMS it also could be used to describe its context, who creates and provides it, how it has been created, when, how to exchange it and use it, etc. (which would also need deep research on the reusability of existing semantic resources and ontologies and the establishement of equivalences between them).

In today's electronic world one mechanism for sharing and reuse digital assets is "interoperability". As ontologies are also meant to facilitate the semantic interoperability the "Asset Ontology" should be an essential instrument to link multiple domains, described via other ontologies that define classes of assets. This idea implies that:

The focus of the exchanges would not be put on unespecified resources (a mere IRI) nor on a very domain-specific object that is unaware of its value, purpose, provenance, status, etc.;
If the object being exchanged "is" an asset then it self-describes the res and facilitates its exchange, use, consumption, dissemination, etc.

There is a wide range of things that can be estimated as digital assets, for example: "eBusiness Document", "Specification", "Solution Building Block", "Evidence", "Dataset", "Catalogue", "Software Component Library", "SCORM2 SCO", “Assessment” .... all these conform to the definition of Asset given above. Notice that the assets mentioned are prone to be interlinked to generate aggregate assets which in turn are candidates for exchange. Thus, for example, a system could discover and fetch a asset “DAC catalogue” that collects EIRA-based assets “building blocks” that are associated to assets “standards or specifications” that are assessed in assets “CAMSS assessments” that are supported by assets “Evidences”; the implementation or use of the asset catalogue or of each asset building block could be associated to an e-Training asset “IMS-LD course” that discovers and aggregates assets “SCORM2 Sharable Content Objects (SCOs)”, and so on and so forth. As far as the asset describes the nature, context, access etc. of its res, the relation between assets needs no categorization (cfr. the “dcat:relatesTo” nesting property from dcat:Dataset to dcat:Dataset).

In terms of design, we could think of the Asset as a "base metaclass" or as an associated "metaclass", a class of classes that describes the instance of its descendant or of the domain class. Which brings on the table this other discussion on whether assets should be designed as super classes or as associated classes that may or may not share the same life-cycle and environment.

The practice of "the more modular and decoupled the better" is probably one of the greatest tenets in terms of [software] engineering. Treating the Asset as either a base metaclass or as an associated class fits that principle, specially if the Asset model is flexible enough (i.e. standard-oriented) and does not imposes strict restrictions. What is to be referenced by other ontologies and shared between domains is then the metaclass, not a reference to an resource (IRI) or to an object that is associated via a domain-specific predicate. By the way, OWL2 caters for this and even provides the possibility of naming the metaclass as the class it represents, e.g. one could associate an individual of a Dataset to a metaclass "Dataset" that extends “Digital Asset”, which in turn would extend "Asset" (see comments on "Punning" in the W3C OWL2 documentation.

The "metaclass" design approach somehow would put forth the opinion that the properties of the metaclass do also contribute to define the nature of the asset, and would confirm that the asset "is" the resource. This vision blurs the line between what are the asset data from what are associated metadata. One more reason to propose abandoning the naming of "Asset Description Metadata Schema" in favor of "Asset Ontology".

paulakeen commented 4 years ago

Asset Ontology: a Core Vocabularies Core Ontology?

If the Asset Ontology is core enough we could describe any type of Asset. In reaching this goal then we could extend it with sub-ontologies related to one specific core concept.

The very first beneficiaries could be the Member States and the European Commission Systems trying to implement the Single Digital Gateway Regulation (SDGR).

Think for example of the asset "Vehicle" and the fact that as per today we do not have a "Core Vehicle Vocabulary" in SEMIC.

Is a Vehicle an Asset? Well, many of its properties can be seen as aspects of Work, Expression, Manifestation and Item: a vehicle is the result of a work, it has value and purpose, it has a creator and a distributor, its brand and model are attributes of its expressions, the plate number and chasis registration number are attributes of its manifestation, it belongs to an agent (its owner), etc.

paulakeen commented 4 years ago

How to implement the Asset Ontology based on the reuse of the IFLA LRM specification?

When googling this topic the surprise is that apparently there are no many actual implementations of the IFLA FRBR or the IFLA LRM specifications. And the ones existing are related to bibliographic references, although the model is abstract enough so as to describe any work (the examples provided in IFLA FRBR are tremendously varied and deserve an attentive reading).

However there is one implementation proposal that is specially interesting in our opinion: The Ontology-Based Approach of the Publications Office of the EU for Document Accessibility and Open Data Services, by Francesconi & al. 2005.

In this proposal, Work, Manifestation, Expression and Item are treated as “aspects of” the resource and of its sub-metaclasses.

In this approach the Asset is associated to its metadata and the object used for discovery and exchange are the sub-metaclasses of the resource. Each Asset, once retrieved, includes its self-description.

Enrico Francesconi, one of the co-authors of the paper, prepared this short presentation where the implementation approach is clear: FRBR-ShortIntro-Enrico's presentation.pdf

paulakeen commented 4 years ago

About “Aspects Of”

In the article “IT Standards Typology”, Henrik J. de Vries defines “Basic standards as the ones providing structured descriptions (aspects of) interrelated entities to facilitate human communication about these entities, and/ or to facilitate use in other standards.”

For de Vries, basic standards help identify requirements to solve “matching problems [..] the problems of determining one or more features of different interrelated entities in a way that they harmonize with one another, or of determining one or more features of an entity because of its relation(s) with one or more other entities”.

See Jakobs & al., “Advanced Topics in Information Technology Standards and Standardization Research”, Volume 1, Idea Group Publishing, 2005.

The design approach adopted by OP (Francesconi et al. 2015) would be supported by the de Vries’ definition. The IFLA model can then be seen as a reference basic standard for the description of other features of different domain entities interrelated through the aspects of Work, Expression, Manifestation and Item.

Beware that if we adopted OP’s design approach, the Asset class would not be a “descriptor” but an abstract base class that is related to its descriptors via its “AspectsOf” sub-properties (similarly to what is done for the classes Resource and BibliographicResource in Francesconi’s presentation, see comment above).

paulakeen commented 4 years ago

One benefit of adopting IFLA LRM for the description of Assets

The very first benefit of adopting the IFLA model is the enabling of cross-border and cross-domain interoperability via description of assets in multi-domain entity repositories. (As a matter of fact that was the original problem being solved by OP with the CELLAR solution: how to make possible the discovery of multi-domain and heterogenous bibliographic resources that are used for cross-border —multiple expressions: all EU official languages; and multiple manifestations: HTML, PDF — and cross-domain purposes and interoperability).

Why do multi-domain repositories enable cross-border cum cross-domain interoperability? Well, a search for common aspects of a work, an expression or a manifestation in such a repository will return all the entities that, regardless of the domain where they are used, share such common aspects. If such aspects describe assets then different domains will be able to exchange them through interfaces that recognise the abstract entity Asset and access particular Asset individuals.

Let us develop this idea using three examples: e-Certis, an e-Certis-based SDG and JoinUp.

One previous clarification, though, CELLAR, e-Certis, and an e-Certis-based SDG central platform could be considered “base registries” as they manage only one main concept each: “Bibliographic Resource” and subtypes of it; Evidence Types-via-Criterion (so the root class is “Criterion”); and “Public Service”, although it is currently named “Procedure”. The fact that they are base registries does not imply they are not meant for cross-border and cross-domain purposes; far from that it reveals the main characteristic of base registries: they are context-of-use unaware and therefore multipurpose!

JoinUp in turn is virtually able to publish any type of concept as it uses this mechanism of abstract Asset.

The multi-domain e-Certis repository

As per today the current e-Certis solution allows the EU institutions and Member States to define and maintain European and national criteria and evidence types for procurement purposes. But the very next steps aim at revamping e-Certis so it becomes a true multi-domain repository.

So far so good. But, wait a moment, a multi-domain repository of .... what ... and for whom?

The ultimate goal of e-Certis has always been making possible that one stakeholder located in one EU Member State can understand what another EU Member State authority will require when trying to operate in that other cross-border MS. From a long time this was known as “evidence mapping”. Some years of experience trying to implement this has clarified that (1) the need is not related to “evidences” but to “types of evidences”, (2) evidences types may refer to “structured or narrated information to be carried in documents” but more and more frequently to “data to be serviced by one or more data providers” (see TOOP specifications for an example); and (3) evidence types are means to proof that one "criterion" (or more than one) are met by the stakeholder operators.

[... to be completed]

SDG central platform, an e-Certis Use Case

[... to be completed]

The JoinUp hub of assets

[... to be completed]

makxdekkers commented 4 years ago

@paulakeen I can give some background for ADMS and its relationship with DCAT. First of all, the timelines of the development of ADMS and DCAT back in 2011-2014 were essentially in parallel; in fact ADMS started a little earlier. They became connected because they were considered to be very similar. DCAT became a W3C Recommendation and ADMS a Working Group Note as a profile of DCAT. It seems to me that the confusion stems from the fact that you interpret the word "Asset" in the way the English dictionary defines it, basically as a synonym of "Resource". That makes you think that an Asset is more general than a Dataset. This is not how it was designed. Its scope is contained in the first sentence in the document at W3C: an Asset in the context of ADMS is "highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies)". It was initially developed for an Asset Catalogue at Joinup that contained those particular things. An indication of that is the list of terms in the ADMS Asset Type vocabulary which contains things like 'ontology', 'taxonomy', 'mapping' etc. This was considered a special case of a Dataset (A collection of data, published or curated by a single agent, and available for access or download in one or more representations). In that sense, ADMS should be and remain a profile of DCAT. I have no opinion on your proposal to develop an Asset Ontology, but I would suggest not to reuse the term Asset for it to avoid confusion.

paulakeen commented 4 years ago

Many thanks for this very interesting and relevant input @makxdekkers!

I understand that you had to make a design decision that led you to, in the context of ADMS, consider (and define) an Asset as metadata. But that is precisely what i am challenging: the Asset is the thing I need to catalogue, not its metadata. Of course, that this thing I want to catalogue, exchange or use for especialization can/may/should additionally be connected to descriptors of the thing (via metadata). By cataloguing or specialising or exchanging such Assets I can then reach the different "descriptive systems" they may be connected to (not only DCAT's!), thus making the Asset self-descriptive or enabling them to refer to its description.

One way of thinking of the Asset is, in terms of OOP, as an abstract interface or base class that you can use as an "interoperability contract". From this "interoperability interface/contract" perspective the Asset is an abstract representation of any type of Assets, like semantic assets, software assets, human resource assets, actual sets of raw data values, etc. For me "metadata" are not in the essence of the Asset but "about" the asset ("aspects of") . And in the current ADMS, if I understand it well, what you're saying is that a data set is equivalent to a profiled set of metadata".

Therefore, the work pending is to agree on the definition of Asset, and based on that definition, the identification of the attributes of Asset, independently of DCAT. If you agreed to that then the name of the ontology should be kept as "Asset Ontology" because it would be about the representation and definition of what is in an Asset and what makes of a resource an Asset.

I can share the reluctance to revamp, backwards-compatibility breaking, switching-cost, etc. But I also think we have a commitment to quality and professionality and to mid-long-term goals (e.g. cross-border AND cross-domain interoperability).

I fear this can be considered a too philosophical discussion, but it is not, and should be taken as a very pragmatic thing, because it could better clarify and enable the cross-border interoperability.

I am eagerly looking forward to having further virtual and face-to-face discussions on this topic, if you have the patience to deal with my stubborness on this :D. For the F2F discussions I pay all the beers!

makxdekkers commented 4 years ago

@paulakeen No there is still confusion. The design decision was not to treat an Asset as metadata. The decision was to treat vocabularies (and similar things) as data and describe them using ADMS.

The Asset in ADMS is a collection of data, which happens to be a vocabulary. For example, the specification of the 15-element Dublin Core is an ADMS Asset, a dataset with 15 data items. So ADMS describes this dataset, the same way that DCAT would describe a spreadsheet with observations.

Look for an example at the schema for DCAT; the schema starts with metadata about DCAT, namely that it is an Ontology and that Simon Cox contributed to its development:

<http://www.w3.org/ns/dcat>
  a owl:Ontology ;
  dct:contributor [
      a foaf:Person ;
      sdo:affiliation [
          foaf:homepage <https://csiro.au> ;
          foaf:name "Commonwealth Scientific and Industrial Research Organisation" ;
        ] ;
      rdfs:seeAlso <https://orcid.org/0000-0002-3884-3420> ;
      foaf:name "Simon J D Cox" ;
      foaf:workInfoHomepage <http://people.csiro.au/Simon-Cox> ;
    ] ;

That information could be expressed as ADMS as it is the metadata for DCAT. The schema file itself would be one of the Distributions of DCAT, the Asset, being a dataset containing seven classes and about 25 properties.

So the vocabulary is the Asset and ADMS is used to describe the vocabulary.

paulakeen commented 4 years ago

Good and strong counter-defence. Allow me though these two questions, please:

Are vocabularies and similar things data? Really? Well we could consider so if the definition of metadata is data about data, so it is a metaclass of data. And what about software, is software data? And still, I guess, you had the imperative mandate of coming up with a narrowed (profiled) definition and design and you used DCAT for the sake of reusability and economy in the semantic and software asset description domain. So far so fair.
What about vocabulary-dissimilar things that are also assets but not data? If data is data about data ... can we still use the term metadata to describe them or would we need a different, and yet standard, system and term to refer to the characteristics of the object? Which is why we’re proposing to go back and refer to them via the work, expression, manifestation and item aspects, as one possibility amongst other alternative and complementary systems.

In the example on vehicles mentioned in a previous comment above, the proposal is not to describe the “Vehicle Vocabulary” but the “Vehicle Object”. What is “colour red”, “brand Seat”, “model Panda”, “plate number 34123-GB” and “chasis number 052101231-BS-ES”, metadata or work/expression/manifestation/item aspects of the object? In our vision, metadata describes data, aspects describe entities (e.g. objects). If these descriptions are aspects of the object, they may be defining the essence of the vehicle too, not only describing the object. For instance, if you say that a truck has 10 wheels then you are tackling with the essence of the object, it’s not any truck but a trailer. When modelling vehicles you may want to define the property numberOfWheels as a subproperty of a manifestation aspect of the truck, so you can identify “trailer” via its manifestation aspect and/or its essential property number of wheels.

paulakeen commented 4 years ago

ADMS or SSADMS?

In a previous comment @makxdekkers suggests that the Asset Ontology should be named differently, but I wonder whether the renaming need is not for the current ADMS....if in ADMS Assets are only vocabularies and similar things, e.g. Software code, shouldn’t it be renamed Semantic and Software Description Metadata Scheme...if that is the actual current ADMS domain?

makxdekkers commented 4 years ago

@paulakeen You're overthinking this. Let me try to explain in different terms. At the start of the process, we had a bunch of things that we wanted to describe. Those things happened to be vocabularies, ontologies, data models, and similar thigs, so we developed a metadata schema to describe those things. That metadata schema is ADMS; it is designed to describe the kind of things that we wanted to describe, namely vocabularies, ontologies and similar things, That was all we tried to do. So the answer to your first question is: Yes, we treated these vocabularies, ontologies, and data models as data . I have no answer to your second question: the kinds of things you ask about were outside of our scope.

paulakeen commented 4 years ago

@makxdekkers out of your scope back then....but not out of the scope of the current EU Institutions needs and requirements (think of SDG), e.g. Interconnecting a wide variety of systems (business and citizens, base registries, public administrations procedures and services, etc.) that need to exchange instances of classes that are not vocabularies, ontologies, data models and similar things: vehicles, for example, software architecture and solution building blocks, assessments, etc. See examples provided in previous comments above.

makxdekkers commented 4 years ago

@paulakeen You are right, ADMS could have been called SSADMS, but at that time it was decided to call it ADMS. We can't undo that. Of course you are free to call your work whatever you want, as long as you clearly declare its scope and the meaning of its terms, But is has to be clear that the term Asset in your proposed Asset Ontology is a completely different thing than the Asset in ADMS. The way I see it, what you're proposing is not a replacement of ADMS but something completely different and the two could peacefully coexist.

paulakeen commented 4 years ago

Fair enough...in time the coexistence could consisit in the realisation that SSADMS is a specialisation of the Asset Ontology.

BTW, knowledge building is not, IMHO, overthinking. All the above reflections are not a criticism to the humongous and impressive work done in DCAT for these so many years now and so many brilliant people, as you yourself, but an intent to come up with improved practical solutions.

The initial [broad] scope and definition of our proposed Asset allows for the subsumption of the SSSADMS asset in a possible future Asset Ontology,

makxdekkers commented 4 years ago

Apologies, I wasn't criticising you for 'overthinking'. I tried to say that our scope was fairly simple and we did what we could.

paulakeen commented 4 years ago

No need to apologise, @makxdekkers, for me this is a friendly and intelectually productive discussion, hopefully leading to innovative solutions.

makxdekkers commented 4 years ago

OK!

bertvannuffelen commented 4 years ago

@makxdekkers @paulakeen @sandervd I created a specific issue around the definition of adms:Asset. I tried to make the above discussion concrete and focused on one term. If we can settle the definition for that term we can deduct the future effort required.

paulakeen commented 4 years ago

Issue referred to by @bertvannuffelen is #25. Very interesting rules, btw.

SEMICeu / ADMS-AP