Letractively / publishing-statistical-data

Automatically exported from code.google.com/p/publishing-statistical-data
0 stars 0 forks source link

Should concepts be rdf:Property rather than skos:Concept? #19

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is a very broad issue to explore a completely different design for the DSD 
level of SDMX-
RDF, in particular the way concepts are translated into RDF. Perhaps we are 
already too far down 
the road with the current design to still consider this, but I'd like to track 
this anyway and make it 
available for comment and discussion.

Abstractly speaking, each of the “components” of a DSD (dimensions, 
attributes, measures) are a 
marriage of three things:

1. a concept from a reusable concept scheme that specifies the semantics

2. a code list or literal data type that specifies the legal values

3. information about the role that the concept plays in the specific DSD 
(primary measure, time 
dimension, coded/uncoded, ...)

The current SDMX-RDF design encodes these three things in the following way: 
Each component 
is an instance of rdf:Property; 

1. the concept is attached to it via sdmx:concept;

2. the code list or data type is attached via sdmx:codeList or rdfs:range; 

3. the role is marked via subtypes of rdf:Property, such as 
sdmx:PrimaryMeasureProperty, 
sdmx:TimeDimensionProperty and so on.

    # DSD level
    my:PopulationDataStructure a sdmx:DataStructureDefinition;
        sdmx:component my:country, my:xxx, my:yyy, ...
        .
    my:country a sdmx:DimensionProperty, sdmx:CodedProperty;
        sdmx:codeList my:CountryCodeList;
        sdmx:concept stats:COUNTRY;
        .

    # observation level
    <abcd> a sdmx:Observation;
        my:country my:IE;
        my:xxx my:foo;
        my:yyy my:bar;
        sdmx:obsValue 123;
        .

    # defined by someone else
    stats:COUNTRY a sdmx:Concept, skos:Concept .

A different design could be imagined and would have several advantages. 

1. Because the “concepts” specify the semantics, we could directly make 
them reusable 
rdf:Properties, rather than skos:Concepts.

2. The code list or data type could again would be attached to the property via 
sdmx:codeList or 
sdmx:range.

3. Since we want the properties to be re-usable across DSDs, we can't type them 
to indicate the 
role in the DSD, but would perhaps use an n-ary relation between the property 
and the DSD to 
indicate the details of its role.

    my:PopulationDataStructure a sdmx:DataStructureDefinition;
        sdmx:dimension [
            sdmx:property stats:COUNTRY;
            #sdmx:codeList my:CountryCodeList; # to override code list defined on concept level?
        ];
        # other components xxx and yyy
        .

    # observation level
    <abcd> a sdmx:Observation;
        stats:COUNTRY stats:IE;
        xxx:xxx xxx:foo;
        yyy:yyy yyy:bar;
        sdmx:obsValue 123;
        .

    # defined by someone else
    stats:COUNTRY a rdf:Property;
        sdmx:codeList stats:CountryCodeList;
        .

So the proposal is to NOT define new properties as part of the DSD, but to see 
the DSD just as a 
structure that bundles together a set of existing properties defined elsewhere, 
that directly 
represent the concepts and carry the semantics.

Advantages: Promotes re-use of concepts; makes the question of wether 
Dimension/Attribute/MeasureProperties can be re-used across datasets moot.

Disadvantages: Makes the connection between observations and DSD less direct 
(you have to go 
through the Dataset and cannot go through the properties any more); is a bit 
un-SDMX-like 
because in SDMX, code lists and concepts are married only on the DSD level, 
while this design 
here would promote the association of concepts and code lists at the time when 
the concepts are 
defined (which is a good thing in general IMO, but not the SDMX way of doing 
things); requires 
very different translations of similar SDMX features (code list, concept 
scheme) into RDF (SKOS 
concept scheme, set of RDF properties)

Original issue reported on code.google.com by richard....@gmail.com on 30 Mar 2010 at 2:01

GoogleCodeExporter commented 9 years ago
More reasons for modelling concepts directly as rdf:Properties, rather than 
skos:Concepts:

1. It lets us use any existing property (say, rdfs:comment or dc:modified) as 
an attribute simply by mentioning 
it in the DSD. This basically gives us a huge pool of “concepts” to be used 
as attributes for free.

2. This turns the DSD into merely an informative structure. The DSD still 
provides us with some potentially 
helpful information for interpreting the dataset, and will be useful for 
validation; but to understand the 
semantics of the data, it is sufficient to study the observations and time 
series, and their attached properties 
(dimensions etc) which are hopefully widely shared and thus well-known.

Original comment by richard....@gmail.com on 30 Mar 2010 at 7:47

GoogleCodeExporter commented 9 years ago
The pure property approach has the problem that it doesn't reflect SDMX's 
separation
of concept from the role that concept plays within a given DSD.

Thus with the current SDMX-RDF design we can use the concept of Currency as 
either a
Dimension (thing that is measured) or as an Attribute (unit of measurement) but 
know
how the two are related together. With the property-based approach we would 
need two
properties and would then need to devise some way of indicating they relate are
related to each other. By the time we have done that it feels like we would be 
back
to something isomorphic to the current design.

By defining the ComponentProperty translations of each COG concept once (as 
we've
done) we've opened the door to reuse across DSDs when appropriate without 
forcing reuse.

The ability to use any existing rdf:Property as a ComponentProperty isn't 
actually
ruled out by the current design. However, it seems more useful to have things 
like
rdfs:comment and dct:modified available as annotations rather than Dimensions or
Attributes. With the current design we could easily create an sdmx:Attribute
corresponding to some existing rdf:Property if needed. [Technically it wouldn't 
even
need an explicit corresponding concept since we don't yet have any cardinality
restrictions in the vocabulary.] Whereas if any rdf:Property were reusable 
directly
then we couldn't tell from an observation what role it is playing and would be 
unable
to differentiate between annotations (documentation) and Attributes (which carry
semantics) except via the indirect references in the DSD. 

Original comment by Dave.e.R...@gmail.com on 7 Apr 2010 at 1:31

GoogleCodeExporter commented 9 years ago
Dave, you say that we couldn't re-use Currency as an Attribute and Dimension 
with the property-based 
approach to concepts. But that's not correct. I could define a single property 
ex:currency, and then say that 
this property acts as a dimension in one DSD, and as an attribute in another:

    my:FinancialDataStructure a sdmx:DataStructureDefinition;
        sdmx:attribute [
            sdmx:property ex:currency;
        ];

    </obs/1234/abcd> a sdmx:Observation;
        ex:currency ex:USD;
        ...

This use doesn't preclude the use as a dimension in another DSD. Hence, re-use 
of concepts is not harmed.

It is true that differentiating between dimensions and attributes can get more 
cumbersome, if one starts with 
an observation (instead of observation > property > type of component, one has 
to go: observation > dataset 
> dsd > type of component). But if one already holds a reference to the 
dataset, then it's a pretty direct 
connection -- dataset > dsd > type of component.

Original comment by richard....@gmail.com on 7 Apr 2010 at 8:08

GoogleCodeExporter commented 9 years ago
Richard, sure I agree you *could* do it that way. However, to me if you have
different semantics you should have different properties rather than the one
overloaded property disambiguated by a context annotation a few steps removed.
Overloading a single term with two different semantics makes the modelling 
harder to
understand and makes practical data merging harder. 

Imagine a user looking a two groups of Observations; in one ex:currency is 
meant to
be a dimension of a cube in the other it's the units the observation is 
measured in;
yet they are identical RDF except for the sdmx:dataset link. I don't we could 
argue
that the RDF mapping had met any goals about supporting linking and data 
merging in
such a case!

Original comment by Dave.e.R...@gmail.com on 8 Apr 2010 at 8:15

GoogleCodeExporter commented 9 years ago
Thinking more about this, a stronger argument against property-based concepts 
might be that a single concept 
might play multiple roles in the *same* DSD. For example, a DSD for global 
exchange rates would maybe have 
*two* dimensions for Currency (buy/sell). To tell them apart on the observation 
level, one needs to define two 
DimensionProperties.

Original comment by richard....@gmail.com on 8 Apr 2010 at 10:54

GoogleCodeExporter commented 9 years ago
We agreed on the call on 6th May that we have reached consensus on this issue

Original comment by i.j.dick...@gmail.com on 7 May 2010 at 9:07