PopulateTools / gobierto

Plataforma de gobierno abierto open source
https://gobierto.es
GNU Affero General Public License v3.0
74 stars 32 forks source link

Research how to integrate DCAT in Gobierto Data #2671

Open amiedes opened 4 years ago

amiedes commented 4 years ago

https://www.w3.org/TR/vocab-dcat-2/

http://rml.io/

https://github.com/ruby-rdf/rdf-vocab

https://www.boe.es/diario_boe/txt.php?id=BOE-A-2013-2380

https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

https://github.com/ckan/ckanext-dcat

Look at DCAT in:

furilo commented 4 years ago

@amiedes: @entantoencuanto will be looking at some of these things this week.

furilo commented 4 years ago

Issue updated with link to DCAT-AP in EU site https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

entantoencuanto commented 4 years ago

I've inspected the DCAT of datos.madrid.es. And I think we can generate similar data by adding some extra attributes to both custom fields and vocabularies terms. For example, a dataset appears in the catalog in this way:

<dct:identifier>···</dct:identifier>
<dct:title xml:lang="es">···</dct:title>
<dct:description xml:lang="es">···</dct:description>
<dcat:theme rdf:resource="http://···"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:issued>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:modified>
<dc:language>···</dc:language>
<dct:publisher rdf:resource="http://···"/>
<dct:license rdf:resource="https://···l"/>
<dcat:distribution>
  <dcat:Distribution>
    <dcat:accessURL rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI"></dcat:accessURL>
    <dcat:mediaType>···</dcat:mediaType>
    <dcat:byteSize>···</dcat:byteSize>
  </dcat:Distribution>
</dcat:distribution>

For the custom fields:

{
    "es": "Parques Nacionales",
    "en": "National Parks"
}

once decorated this information can be included as:

<dct:title xml:lang="es">Parques Nacionales</dct:title>
<dct:title xml:lang="es">National Parks</dct:title>

For the vocabulary terms:

{
    "theme": [1]
}

The 1 is the id of a vocabulary term which includes a meta:

{
    "rdf:resource": "http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"
}

With a vocabulary decorator with source for the custom field the result would be:

<dcat:theme rdf:resource="http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"/>

Other type of vocabulary fields may use different decorators with an output like this (in this case it's a vocabulary field with multiple selection allowed):

<dcat:keyword xml:lang="es">Medio Ambiente</dcat:keyword>
<dcat:keyword xml:lang="es">Impacto ambiental</dcat:keyword>
stbnrivas commented 3 years ago

== WIP ==

before to create a filled rdf dcat it is necessary map some values in any part of application.

Also I'd confirm my thought of a Catalog is dependant of a site (in any way) and a site only have a catalog

dcat:Catalog

values possibly related with a site: attribute name example of value explanation
dct:title open dcat data catalog #{city}
dct:description open data catalog for #{city} with data into years 2019 until 2021 with formats ...
dct:identifier #465234646344
dct:issued site.created_at
dct:modified GobiertoData::Dataset.maximum(:updated_at)
dct:license link to license
dct:keyword stats create a new keyworks into dataset model
dct:keyword contract
dct:modified site.datasets.max(:updated_at)
dct:creator site.organization.name
dct:publisher site.organization.name
dct:contributor empty
dct:accrualPeriodicity (daily, what values fit here?) https://www.w3.org/TR/vocab-dcat-3/#temporal-properties
foaf:homepage some url
dcat:themeTaxonomy
dct:hasPart unused by us
dcat:dataset contain the dcat:Dataset
dcat:service contain the dcat:Service empty for us
dcat:catalog ?
dcat:record ?

dcat:Dataset

of course there another associated to a dataset that probably should be added as custom fields attribute name example of value comments
dct:identifier gobierto_data_datasets_url(id: slug)
dct:title
dct:description
dct:keyword can be multiples keywords
dct:issued
dct:modifed
dct:language
dct:license
dct:publisher site.organization.name
dct:distribution contain the 0+ dcat:Distribution

dcat:Distribution

a distribution belongs to dataset and it is a specific representation of a dataset like csv, xml ... attribute name example of value
dct:identifier
dct:title
dct:description
dct:accessURL
dct:format application/csv

dcat:DataService (UNUSED BY NOW)

a data service: is a collection of operations through an interface (ex API) to access to one or more datasets attribute name example of value
identifier

WIP

furilo commented 3 years ago

For creator I'd just use site_name

ferblape commented 3 years ago

Looks good, let's complete this list today because most of the values are available from models and you can start implementing it.

On Mon, 26 Apr 2021 at 17:49, Álvaro Ortiz @.***> wrote:

For creator I'd just use site_name

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PopulateTools/gobierto/issues/2671#issuecomment-826946701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEJUCLY6VUKJSHLST5WZLTKWDR7ANCNFSM4JMZ7HOQ .

-- Fernando Blat @.*** +34 660825001

Populate / Tools for civic engagement https://populate.tools

Project stories twitter.com/populate_ & populate.tools/blog

furilo commented 3 years ago

@stbnrivas please use https://www.itb.ec.europa.eu/shacl/dcat-ap/upload or other validator to validate the XML.

stbnrivas commented 3 years ago

RDF validator DCAT validator

DCAT-AP VALIDATOR (not use, we are implement DCAT)