Examples of the need to model Evidence as Dataset

jimjyang commented 2 years ago

Conf. the CPSV-AP webinar today, where I mentioned our needs to be able to reuse datasets as evidences.

For example, when you apply for some services from a given municipality, you usually need to provide evidence that you are a registered resident in that given municipality. The authoritative source for your registered home address in Norway, is the National Population Register. The National Population Register is a dataset which is already described in the National Data Catalog, https://data.norge.no/datasets/e281c8c6-b944-4662-861d-a475e973e393, with its distributions etc.

What we need in this example, is to be able to express that the already described/existing dataset "National Population Register" mentioned above, is an Evidence to a given municipal public service. [Conf. the discussions at the webinar today, we are not talking about referring to millions of individual evidences, but referring to a register which may provide individual evidences.]

Reuse of already existing datasets is very important, conf. the "once only" principle.

bertvannuffelen commented 2 years ago

Lets elaborate this example together.

let ex:pubser1 a public service.

ex:pubser1 a <http://purl.org/vocab/cpsv#PublicService>

In order to use this public service one has to provide its domicile (registered home address). The register of domiciles is the National Population Register (Folkeregisteret)

<https://data.norge.no/datasets/e281c8c6-b944-4662-861d-a475e973e393> a dcat:Dataset.

The input of the public service is thus a record from the Folkeregisteret. Namely that is the evidence a person Bob has to provide to the public service ex:pubser1.

The system Bob has to contact to retrieve his domicile (the required evidence) for him is thus the FolkeRegisteret.

The evidence in the context of CPSV-AP has input is the actual record for Bob (or a generic template), not the collection of the evidences. The FolkeRegisteret is thus the provider for the input, not the input for the service.

ex:pubser1 a <http://purl.org/vocab/cpsv#PublicService>

ex:pubser1 <http://purl.org/vocab/cpsv#hasInput> ex:evDomicile.

ex:evDomicile ex:isProvidedBy <https://data.norge.no/datasets/e281c8c6-b944-4662-861d-a475e973e393>.

<https://data.norge.no/datasets/e281c8c6-b944-4662-861d-a475e973e393> a dcat:Dataset.

Does this resonate with you?

This shows an attention point: it is not because 2 things are of type dcat:Dataset they are about the same granularity. I have seen this in the research community happening: a graveyard is a dcat:dataset and a dataset containing all observations of birds in a country and they are combined together in a single catalogue. That are 2 different granularity levels. DCAT allows to describe them both, RDF allows to merge them both, but it isn't very wise to do so. It is really impossible to make a sensible structured query over them. Suppose we would turn each datapoint in a statistical data cube of Eurostat into a dataset, and upload that to data.europa.eu. Then the EUP would be impossible to use.

The same thing happens here: Evidence being a dcat:Dataset is different from the collection/register of the evidences as dcat:Dataset. CCCEV and CPSV-AP are about the first, more detailed level, while DCAT-AP is at the second more granular level.

ps: the hasInput has another issue, namely the difference between descriptive and execution of a Public Service. For that https://github.com/SEMICeu/CPSV-AP/issues/95 should be consulted. I used it here as descriptive but in the last webinar it has been raised that it should be used for the execution. So the example is not 100% correct at the level.

jimjyang commented 2 years ago

Thanks, @bertvannuffelen!

Our example and our usage of CPSV-AP (so far) is also descriptive, so we are at the same level.

Could you explain ex:isProvidedBy used in your example? Is it sort of dct:isPartOf? Because in this example of ours, we wish to be able to express, in one way or another, that ex:evDomicile for Bob is a specific "row"/"field" in the "table" National Population Register.

We are aware of granularities, and the difference between "a row" in "a table" as a dcat:Dataset and "the whole table" as a dcat:Dataset. Our National Population Register with its APIs is (also) meant to provide access to a specific piece of information (e.g. the registered address) about a given registered person (e.g. Bob). Moreover, our user Bob may e.g. be asked to give his consent to let the service provider to retrieve relevant info from the National Population Register (instead of Bob being asked to attach the evidence that he has to get from the National Population Register first). So, our need is to include the authoritative source National Population Register in the description of the service, in one way or another and preferably directly.

Anyway, two things:

At the last CPSV-AP webinar, there was a discussion about if we should remove the subClassOf-relation between cv:Evidence and dcat:Dataset. With this Issue, we wish to illustrate cases where an evidence is a dataset.
If, at the descriptive level, we must have a "bridge"/relation between "a row" and "the table" (which we hoped to avoid), we hope that CSPV-AP specifies (or at least recommends) how-to (using dct:isPartOf, dct:source, or ...).

SEMICeu / CPSV-AP

Examples of the need to model Evidence as Dataset #111