DOREMUS-ANR / marc2rdf

Converter from UNIMARC/INTERMARC to RDF using the DOREMUS model
Apache License 2.0
6 stars 0 forks source link

[PP] Keep the TUM id #62

Open pasqLisena opened 7 years ago

pasqLisena commented 7 years ago

So far, we have in the data only the id of the "Notice d'ouvre".

We should find a way to keep also the id of the TUM (without confusing the 2).

rtroncy commented 7 years ago

You use the dc:identifier property so far, right? Do you need to refine (specialize) this property?

pasqLisena commented 7 years ago

Another way is to use better the prov:Entity that are connected with prov:wasDerivedFrom the expression (example in http://data.doremus.org/expression/71d0c14f-fa29-369b-9e19-36ad902dcbfa )

rtroncy commented 7 years ago

I think you're talking about orthogonal issues.

First, you may want to distinguish the various IDs by not using a single property (dc:identifier) but by reifying this property into some sub-properties that convey more semantics, such as this is the ID in the TUM sense, this is the ID in our internal system, etc. Do we want this?

Second, you may want to described the relationships between those multiple ids and more precisely and you may use PROV for that.

Can you provide a full example of what you have in mind and what problem does it solve?

pasqLisena commented 7 years ago

Solution 1

Prefixing the identifiers.

<http://data.doremus.org/expression/71d0c14f-fa29-369b-9e19-36ad902dcbfa>
      dct:identifier  "N 0804799" , "TUM 0804786" .

Path: ?exp dct:identifier ?id

Solution 2

The identifiers are specified in the source files (PROV). (I use blank nodes just for visualising here.)

<http://data.doremus.org/expression/71d0c14f-fa29-369b-9e19-36ad902dcbfa>
      prov:wasDerivedFrom  [ 
              a prov:Entity; 
              dct:identifier "0804799" ;
              dct:type "Notice d'ouvre"@fr ;
              dcat:mediaType "text/xml" ;
              dct:conformsTo <http://data.doremus.org/standard/unimarc>;
              prov:wasAttributedTo <http://data.doremus.org/organization/Philharmonie_de_Paris>
     ] ,  [ 
              a prov:Entity; 
              dct:identifier "0804786" ;
              dct:type "TUM"@fr ;
              dcat:mediaType "text/xml" ;
              dct:conformsTo <http://data.doremus.org/standard/unimarc>;
              prov:wasAttributedTo <http://data.doremus.org/organization/Philharmonie_de_Paris>
     ] .

Path: ?exp prov:wasDerivedFrom / dct:identifier ?id

Solution 3

Use the F40 Identifier Assignment and F13 Identifier. I do not like this solution and I am not sure if is structurally correct.

rtroncy commented 7 years ago

Solution 4

Reifying the dct:identifier property

mus:Uxx_tum_identifier a owl:DatatypeProperty ;
    rdfs:subPropertyOf dct:identifier ;
    rdfs:isDefinedBy <http://data.doremus.org/ontology#> ;
    rdfs:label "Uxx TUM identifier"@en .

mus:Uyy_wn_identifier a owl:DatatypeProperty ;
    rdfs:subPropertyOf dct:identifier ;
    rdfs:isDefinedBy <http://data.doremus.org/ontology#> ;
    rdfs:label "Uxx work notice identifier"@en .

<http://data.doremus.org/expression/71d0c14f-fa29-369b-9e19-36ad902dcbfa>
      mus:Uxx_tum_identifier "0804786" ;
      mus:Uyy_wn_identifier  "0804799" .

P.S:

pasqLisena commented 7 years ago

Solution 1 has the drawback that one needs to parse the string to get the semantics of the value.

Sure, but normally these values are just used internally by us.

Solution 4

I like how easy is its usage. But we really want to add it to the Ontology? Is it so "music-related"?

Anyway, we can state that solution 2 & 4 are the best ones.

rtroncy commented 7 years ago

But we really want to add it to the Ontology? Is it so "music-related"?

Well, yes, TUM is definitively a music-specific type of identifier. Note that schema.org is exactly going this way, proposing to define several sub-properties of schema:identifier for various domains (issn for books, gtin for products, etc.)