FIAF / modelling-workshops

Modelling Workshops
0 stars 1 forks source link

Map Examples #8

Open paulduchesne opened 1 year ago

paulduchesne commented 1 year ago

I know we were trying to avoid parallel conversations, but I thought I would let you know that I have started applying this model to a data sample kindly provided by Heidi.

The source file can be found here, which is an XML record for a single work taken from the Bundesarchiv OAI-PMH endpoint. I have been using this as an opportunity to work with RML, which allows for the specification of an explicit data transformation which is processed by RMLMapper to produce a turtle file ready to be loaded into a triplestore.

As I work through applying the model I will highlight issues here and look forward to any thoughts on how to resolve them.

paulduchesne commented 1 year ago

@ladislav-nfa @torbjornbp @natashafairbairn @stephenmcconnachie @heftberger @annahoegner

stephenmcconnachie commented 1 year ago

Paul if you need another record for working example model, Natasha found a BFI National Archive example that is complex and a mix of messy and clean data – let me know if you need it and I can send JSON or XML.

paulduchesne commented 1 year ago

Please do - would be good to have a plurality of examples to look at. Maybe XML serialization? Thanks!

stephenmcconnachie commented 1 year ago

I’ll email you with that XML

paulduchesne commented 1 year ago

I've tried various iterations and approaches to the transformation from XML to TTL, but the strategy I will be focusing on now is using RML to define a formal transformation to match the structure of the ontology, self declaring all entities.

A secondary process, likely via RDFLIB, would do find and replace: eg selfdeclared https://bundesarchiv.org/activity/Darsteller is replaced by https://fiafcore.org/ontology/cast, as well as replace replacing the non-vocabulary elements with fiaf identifiers (WorkVariant, Agent, etc). If no relevant term is available in the source data (for example manifestation type) declare the superclass.

paulduchesne commented 1 year ago

Notes around the transformation from Bundesarchiv XML to ontology-mapped TTL. Work identifier is synthesised from "Filmwerk" uuid:

<Filmwerk uuid="cd6685c5-d104-4cef-8173-2aeafdfcc78c"> 

translates to

<https://www.bundesarchiv.de/work/cd6685c5-d104-4cef-8173-2aeafdfcc78c> a fiaf:WorkVariant;

Note that a workvariant type is defined (as "Filmart: Spielfilm"), but it does not seem to obvious conform with the FIAF manual terms (monographic, serial) so is not specified in the transformation.

paulduchesne commented 1 year ago

Working with blank nodes XML -> RML is quite difficult, but this model using parentTriplesMap seems to work okay:

@prefix : <http://example.org/rules/> .
@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

fiaf:work_transform a rr:TriplesMap;
  rml:logicalSource [
    rml:source "cd6685c5.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/Filmwerk"
  ].

fiaf:work_transform rr:subjectMap [
    rr:template "https://www.bundesarchiv.de/work/{@uuid}";
    rr:class fiaf:WorkVariant
  ].

fiaf:work_transform rr:predicateObjectMap [
  rr:predicate fiaf:workProperty;
  rr:objectMap [
    rr:constant "work work work";
  ]
].

fiaf:work_transform rr:predicateObjectMap [
  rr:predicate fiaf:hasActivity;
  rr:objectMap [
    rr:parentTriplesMap fiaf:act ;
  ]
].

fiaf:act a rr:TriplesMap;
  rml:logicalSource [
    rml:source "cd6685c5.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/Filmwerk/Credit/Person"
  ].

fiaf:act rr:subjectMap [
   rr:termType rr:BlankNode;
    rr:class fiaf:Activity;
].

fiaf:act rr:predicateObjectMap [
  rr:predicate fiaf:ActivityProperty;
  rr:objectMap [
    rr:constant "activity activity activity"
  ]
].

Produces resulting turtle:

@prefix : <http://example.org/rules/> .
@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

_:0 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:1 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:10 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:11 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:2 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:3 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:4 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:5 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:6 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:7 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:8 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

_:9 a fiaf:Activity;
  fiaf:ActivityProperty "activity activity activity" .

<https://www.bundesarchiv.de/work/cd6685c5-d104-4cef-8173-2aeafdfcc78c> a fiaf:WorkVariant;
  fiaf:hasAgent _:0, _:1, _:10, _:11, _:2, _:3, _:4, _:5, _:6, _:7, _:8, _:9;
  fiaf:workProperty "work work work" .
paulduchesne commented 1 year ago

More involved example, compact formatting:

@prefix : <https://fiafcore.org/mapping/> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

:work1
  a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "cd6685c5.xml" ;
    rml:referenceFormulation <http://semweb.mmlab.be/ns/ql#XPath> ;
    rml:iterator "/Filmwerk"
  ] ;
  rr:subjectMap [
    rr:template "https://www.bundesarchiv.de/work/{@uuid}" ;
    rr:class <https://fiafcore.org/ontology/WorkVariant>
  ] ;
  rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/workProperty> ;
    rr:objectMap [ rr:constant "work work work" ]
  ], [
    rr:predicate rdfs:label ;
    rr:objectMap [ rml:reference "IDTitel" ]
  ], [
    rr:predicate <https://fiafcore.org/ontology/hasActivity> ;
    rr:objectMap [ rr:parentTriplesMap :activity1 ]
  ] .

:activity1
  a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "cd6685c5.xml" ;
    rml:referenceFormulation <http://semweb.mmlab.be/ns/ql#XPath> ;
    rml:iterator "/Filmwerk/Credit/Person"
  ] ;
  rr:subjectMap [ rr:termType rr:BlankNode ] ;
  rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/ActivityProperty> ;
    rr:objectMap [ rr:constant "activity activity activity" ]
  ], [
    rr:predicate rdf:type ;
    rr:objectMap [ rr:template "https://www.bundesarchiv.de/role/{Funktion/@Funktion}" ]
  ], [
    rr:predicate <https://fiafcore.org/ontology/hasAgent> ;
    rr:objectMap [ rr:template "https://www.bundesarchiv.de/agent/{@uuid}" ]
  ] .

:agent1
  a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "cd6685c5.xml" ;
    rml:referenceFormulation <http://semweb.mmlab.be/ns/ql#XPath> ;
    rml:iterator "/Filmwerk/Credit/Person"
  ] ;
  rr:subjectMap [
    rr:template "https://www.bundesarchiv.de/agent/{@uuid}" ;
    rr:class <https://fiafcore.org/ontology/Agent>
  ] ;
  rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/FirstName> ;
    rr:objectMap [ rml:reference "@Vorname" ]
  ], [
    rr:predicate <https://fiafcore.org/ontology/LastName> ;
    rr:objectMap [ rml:reference "@Nachname" ]
  ] .

produces

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

_:0 a <https://www.bundesarchiv.de/role/Schnitt%2FMontage>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/b3898d04-f988-4253-89ce-60f0b2834347> .

_:1 a <https://www.bundesarchiv.de/role/Darsteller>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/8509fafe-d14f-4452-b88c-0805d7b780db> .

_:10 a <https://www.bundesarchiv.de/role/Darsteller>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/40e0f039-98c2-4973-851b-5ee4832f91a8> .

_:11 a <https://www.bundesarchiv.de/role/Darsteller>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/5249b851-42f2-4f03-b49d-e74a9626496c> .

_:2 a <https://www.bundesarchiv.de/role/Drehbuch>, <https://www.bundesarchiv.de/role/Regie%20%2F%20Spielleitung%20%2F%20Realisation>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/ad399ad0-dcae-4a22-861e-e0bbbe233840> .

_:3 a <https://www.bundesarchiv.de/role/Produzent>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/e2c8a82f-c71f-49ba-a7c0-39a85ab9599d> .

_:4 a <https://www.bundesarchiv.de/role/Kamera%2FBild%2FBildgestaltung%2FFotografie>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/a751b465-9988-461b-a851-5bea16430669> .

_:5 a <https://www.bundesarchiv.de/role/Drehbuch>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/a2496080-1cd7-4a2b-b414-25ea6e5e839b> .

_:6 a <https://www.bundesarchiv.de/role/Kamera%2FBild%2FBildgestaltung%2FFotografie>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/d728f72b-ef54-4d4a-a68f-9592df278ba9> .

_:7 a <https://www.bundesarchiv.de/role/Musik%20%28Filmkomponist%29>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/0260476f-7f35-44ce-9b96-2cbaf2ca2300> .

_:8 a <https://www.bundesarchiv.de/role/Darsteller>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/40a3bbfe-1122-4289-9f97-6609090d53c4> .

_:9 a <https://www.bundesarchiv.de/role/Darsteller>;
  <https://fiafcore.org/ontology/ActivityProperty> "activity activity activity";
  <https://fiafcore.org/ontology/hasAgent> <https://www.bundesarchiv.de/agent/d9f7f51f-de7a-44e0-97d6-81c07a4a1d1e> .

<https://www.bundesarchiv.de/agent/0260476f-7f35-44ce-9b96-2cbaf2ca2300> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/LastName> "The Can" .

<https://www.bundesarchiv.de/agent/40a3bbfe-1122-4289-9f97-6609090d53c4> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Didi";
  <https://fiafcore.org/ontology/LastName> "04. Petrikat" .

<https://www.bundesarchiv.de/agent/40e0f039-98c2-4973-851b-5ee4832f91a8> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Rüdiger";
  <https://fiafcore.org/ontology/LastName> "01. Vogler" .

<https://www.bundesarchiv.de/agent/5249b851-42f2-4f03-b49d-e74a9626496c> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Yella";
  <https://fiafcore.org/ontology/LastName> "02. Rottländer" .

<https://www.bundesarchiv.de/agent/8509fafe-d14f-4452-b88c-0805d7b780db> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Edda";
  <https://fiafcore.org/ontology/LastName> "05. Köchl" .

<https://www.bundesarchiv.de/agent/a2496080-1cd7-4a2b-b414-25ea6e5e839b> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/LastName> "Fürstenberg, Veith von" .

<https://www.bundesarchiv.de/agent/a751b465-9988-461b-a851-5bea16430669> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Martin";
  <https://fiafcore.org/ontology/LastName> "19. Schäfer" .

<https://www.bundesarchiv.de/agent/ad399ad0-dcae-4a22-861e-e0bbbe233840> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Wim";
  <https://fiafcore.org/ontology/LastName> "Wenders" .

<https://www.bundesarchiv.de/agent/b3898d04-f988-4253-89ce-60f0b2834347> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/LastName> "Weitershausen, Barbara von" .

<https://www.bundesarchiv.de/agent/d728f72b-ef54-4d4a-a68f-9592df278ba9> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/LastName> "Müller, Robby" .

<https://www.bundesarchiv.de/agent/d9f7f51f-de7a-44e0-97d6-81c07a4a1d1e> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/FirstName> "Elisabeth";
  <https://fiafcore.org/ontology/LastName> "03. Kreuzer" .

<https://www.bundesarchiv.de/agent/e2c8a82f-c71f-49ba-a7c0-39a85ab9599d> a <https://fiafcore.org/ontology/Agent>;
  <https://fiafcore.org/ontology/LastName> "PIFDA" .

<https://www.bundesarchiv.de/work/cd6685c5-d104-4cef-8173-2aeafdfcc78c> a <https://fiafcore.org/ontology/WorkVariant>;
  rdfs:label """
        ALICE IN DEN STÄDTEN (1973-1974) (Originaltitel)
    """;
  <https://fiafcore.org/ontology/hasActivity> _:0, _:1, _:10, _:11, _:2, _:3, _:4, _:5,
    _:6, _:7, _:8, _:9;
  <https://fiafcore.org/ontology/workProperty> "work work work" .
paulduchesne commented 1 year ago

Transformation of Bundesarchiv example almost complete, findings and notes:

heftberger commented 1 year ago

Fantastic! I am not sure if you wanted me to comment or answer questions, but I might in some cases :-)

BA have provided production start and end dates, we only have hasEventDate currently in the ontology.

Heidi: Year of reference is a bit of a nightmare frankly. We have historical murky cataloguing practices of how to interpret the year attached to work. Now it has been named "production year", because it seems that many times it was interpreted as such. We have introduced a "publication year " on manifestation level related to a publication event. Very complex and not used at all to date.

Interestingly BA have used the same term for form and genre.

Heidi: Not sure I understand. We have Gattung (form) and Genre (genre). We also have Filmart (which we want to get rid of soon).

"Funktionsanmerkung" appears to be used mostly for character names. First problem, no appropriate property in the ontology, also how to filter where a placeholder "-" has been used in source data?

Heidi: That is used for credits, but not in many times. We can leave it out in my opinion.

BA have provided first and last name separately, but this is not possible in the ontology. Should these two fields be joined together into an agent label - not obvious how to do this in RML. Also leading numbers used for credit order, could these be extracted? How to split strings in RML?

Heidi: that is an interesting observation. I thought that most archives keep first/last names in separate fields. By the way, we have a data cleaning project at the moment where we clean about 170.000 person records...

GND for Wim Wenders in source data, can this be retained?

Heidi: is there nothing in external identifiers? we will make more use of GND IDs in the future, starting with keywords.

Not clear what the unit of extent is. Assumed as minutes, but would this be true of all records?

Heidi: What is the German word (Gesamtlänge in Metern?)?

Technical information appears to be mostly duplicated at item and carrier level. Possibility of "pulling these up" to item, as the properties do not exist at carrier level. Good idea would be further analysis on whether these are consistently applied at both levels.

Heidi: I think so.

Appropriate translation to format and element is not clear, and a good example where these might not be delineated in the same way as expressed by ontology vocabularies.

Heidi: Again, what are the German words? Then I can help.

paulduchesne commented 1 year ago

No, this is wonderful - in one of the workshops we did have a conversation that trying to interpret an archives cataloging decisions and structures without involving people from the organisation is a bad idea!

Heidi: Year of reference is a bit of a nightmare frankly.

I'm guessing it can be assumed that there are the same accuracy issues with this field for all archives, given how the idea of assigning a single date to a film can have many different interpretations. A "publication event" date at manifestation level - it would be an interesting thought experiment on how best to map this. A clear proposal for the ontology though would be to introduce distinct start and end date properties.

We also have Filmart (which we want to get rid of soon).

Good to know, I will remove the mapping!

Heidi: That is used for credits, but not in many times. We can leave it out in my opinion.

Do you see value in adding a "character name", or "credited as onscreen" property to allow for this kind of data generally?

Heidi: that is an interesting observation. I thought that most archives keep first/last names in separate fields.

They probably do, which would be a good argument to mirror that here. I was also planning on applying the agent name as agent label eg some agent -> rdfs:label -> "Hal Hartley". This does mean there would be some doubling of data (name to label and name to firstname, lastname properties), but the separation would be useful for agent matching.

Heidi: is there nothing in external identifiers? we will make more use of GND IDs in the future, starting with keywords.

Sorry, this comment was a bit ambiguous - there is a GND in the source data, I just have to figure out how to map it into turtle as I can absolutely see that as something we want to carry across.

Heidi: What is the German word (Gesamtlänge in Metern?)?

Gesamtlaenge, which in this instance is expressed as "112", so I assumed minutes?

format and element: Heidi: Again, what are the German words? Then I can help.

Materialart, Filmbreite, Videoformat - I do recall seeing a list of vocabularies used by BA for film elements which I should track down again, as that would certainly be useful in matching with the manual properties for these format/element terms. I think I have also noticed before that this is a certain area where different archives do noticeably group different attributes, so will present a "harmonisation challenge"!

~

In other harmonisation news I have added such a notebook to the examples folder, so we have the beginnings of "proper" example data to add to a triplestore!

heftberger commented 1 year ago

Gesamtlaenge, which in this instance is expressed as "112", so I assumed minutes?

That would be Meters then. If you give me the Filmwork ID or Archival Number for the item, I can tell you for sure.

Materialart, Filmbreite, Videoformat:

Of the top off my head: Materialart: item element type (dup-pos, orginal neg etc.), this is what was used in the manual Filmbreite: gauge Videoformat: video format

Rose-EFG commented 1 year ago

Hi, after some initial problems to get the RML Mapper started, I can now contribute the first bit of my data transformation of EFG data (from the Bulgarian National Film Archive). @paulduchesne should I start a new issue for this, so it doesn't become too confusing here?

I have met some issues along the way. Mostly I am a bit stuck in my XSLT logic and have to do some research on conditional mappings for RML...

The https://example.org/... URIs are of course only placeholders, because we don't have a EFG URI schema I could reference :)

  1. When creating the blank nodes for Title, Identifier, Event (or any blank node) it always iterates over all the records in my XML, so all the titles, identifiers, events in my XML get assigned to every work. I have to figure out, how to create some logic à la for-each //efg:avcreation/efg:title create a blank node as object and as subject and only link these to each specific work.

  2. I need some conditional mapping to generate useful URIs (either the Country subclasses from the Ontology or something like Geonamed) based on our ISO codes for the production country. I have now created some dummy URIs, as I am (for now) only able to concatenate values in my XML as a suffix to an URI. But I'm sure there's a way... I just have to look into it. :)

  3. The same goes for the definition of the classes for Title and Activity. Instead of WorkTitle rr:subjectMap [rr:class fiaf:Title;]. I could set a more specific subclass (like PreferredTitle) based on the value given in //efg:avcreation/efg:title/efg:type. (Or for the Activity based on //efg:avcreation/efg:relPerson/efg:type)

  4. I had to comment out the last two blocks of mapping, because they lead to a logical error I don't quite understand yet (I'll add the error message later). I think it might be due to the fact, that I create less blank nodes for my Activity than I refer to in the :Agent and :WorkEvent mapping.

  5. I have to find out how to add the language attributes to the rdfs:label

Anyway, here's the XML:

<?xml version="1.0" encoding="UTF-8"?>
<oai:OAI-PMH xmlns:oai="http://www.openarchives.org/OAI/2.0/"
    xmlns:efg="https://www.europeanfilmgateway.eu/">
    <oai:responseDate>2023-04-19T18:40:52Z</oai:responseDate>
    <oai:request verb="ListRecords" metadataPrefix="efg" set="bnfa">
https://dnet-prod.efg.d4science.org/efg/mvc/oai/oai.do
    </oai:request>
    <oai:ListRecords>
        <oai:record>
            <oai:header>
                <oai:identifier>oai:edm:bnfa::fb42eaac771f7e597ff27a7aa5d9835f</oai:identifier>
                <oai:datestamp>2020-10-16T13:23:28Z</oai:datestamp>
                <oai:setSpec>BNFA</oai:setSpec>
                <oai:setSpec>bnfa</oai:setSpec>
            </oai:header>
            <oai:metadata>
                <efg:efgEntity>
                    <efg:avcreation>
                        <efg:avManifestation>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_avManifestation_MFN2747</efg:identifier>
                            <efg:recordSource>
                                <efg:sourceID>MFN2747</efg:sourceID>
                                <efg:provider schemeID="Institution acronym" id="BNFA">Bulgarian National Film Archive</efg:provider>
                            </efg:recordSource>
                            <efg:title lang="bg">
                                <efg:geographicScope/>
                                <efg:text>О, ДОБРУДЖАНСКИ КРАЙ</efg:text>
                                <efg:relation>Original title</efg:relation>
                            </efg:title>
                            <efg:thumbnail>https://i.vimeocdn.com/video/974636590.jpg</efg:thumbnail>
                            <efg:format>
                                <efg:gauge>35 mm</efg:gauge>
                                <efg:colour hasColor="false">Black &amp; White</efg:colour>
                                <efg:sound hasSound="true">With sound</efg:sound>
                            </efg:format>
                            <efg:rightsHolder URL="www.bnf.bg">Bulgarian National Film Archive</efg:rightsHolder>
                            <efg:rightsStatus>In Copyright - Educational Use Permitted</efg:rightsStatus>
                            <efg:duration>00:08:36</efg:duration>
                            <efg:item>
                                <efg:identifier scheme="CP_CATEGORY_ID">BNFA_item_z91ZjIrEqsNfqkF95pwCWg</efg:identifier>
                                <efg:provider>Bulgarian National Film Archive</efg:provider>
                                <efg:type>Video</efg:type>
                            </efg:item>
                        </efg:avManifestation>
                        <efg:identifier scheme="CP_CATEGORY_ID">BNFA_avCreation_MFN2747</efg:identifier>
                        <efg:recordSource>
                            <efg:sourceID>MFN2747</efg:sourceID>
                            <efg:provider schemeID="Institution acronym" id="BNFA">Bulgarian National Film Archive</efg:provider>
                        </efg:recordSource>
                        <efg:countryOfReference>BG</efg:countryOfReference>
                        <efg:title lang="bg">
                            <efg:geographicScope/>
                            <efg:text>О, ДОБРУДЖАНСКИ КРАЙ</efg:text>
                            <efg:relation>Original title</efg:relation>
                        </efg:title>
                        <efg:title lang="EN">
                            <efg:geographicScope/>
                            <efg:text>THE REGION OF DOBRUDZHA</efg:text>
                            <efg:relation>Translated title</efg:relation>
                        </efg:title>
                        <efg:description type="Synopsis" lang="bg">
Кавалерийски части преминават през добруджанско село, хора с цветя, поздравяват войниците, даряват ги, ученически строй, арка покрита с цветя,портрети на цар Борис III, Симеон и Мария-Луиза.; моторизирани части. Войници сред хората. Военни кораби в морето, оръдие, матроси с пушки, крайбрежна ивица, ПНР на Балчик. Хора с лодки пресрещат корабите; военен оркестър, моряци с цветя; спусканена котва; контраадмирал Асен Тошев слиза на брега, ръкува се с посрещачите. Влизанена кавалерийски части в града, хора по улиците хвърлят цветя към войниците,поздравяват генерал Георги Попов – губернатор на Добруджа. Кметството на Балчик. Генералът държи реч. Военен оркестър. Бойно знаме.
                        </efg:description>
                        <efg:description type="Synopsis" lang="EN">
Cavalry units pass through Dobrudzha village, people carrying flowers greet the soldiers. Portraits of Tsar Boris III, Simeon, and Maria-Louisa. Soldiers among people. Battleships at sea, cannon, sailors with rifles, coastline, view of Balchik. People in boats intercept the ships; military band, sailors with flowers; lowering of anchor; Rear admiral Asen Toshev steps on the shore, shakes hands with the people gathered on the shore. Cavalry units enter the city, people in the streets throw flowers at the soldiers, they greet Gen. Georgi Popov - Governer of Dobrudzha. Town hall of Balchik. The General speaks. Miilitary band. Battle flag.
                        </efg:description>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>България, войници</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>армия</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>Добруджа</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>Балчик</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Place" lang="EN">
                            <efg:term>Bulgaria</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>soldiers</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>army</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>navy</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>Dobrudzha</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>Balchik</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Form" lang="EN">
                            <efg:term>Documentary film</efg:term>
                        </efg:keywords>
                        <efg:productionYear>1940</efg:productionYear>
                        <efg:relPerson>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_person_8d4920f9acc3673afed6c2fa34495a90</efg:identifier>
                            <efg:name>Bakardzhiev, Vasil</efg:name>
                            <efg:type>Director</efg:type>
                        </efg:relPerson>
                        <efg:relCollection>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_collection_331b3c2fd0aee7d5eaa099234f827848</efg:identifier>
                            <efg:title>Pre-Socialist Cinema</efg:title>
                            <efg:type>is part of</efg:type>
                        </efg:relCollection>
                    </efg:avcreation>
                </efg:efgEntity>
            </oai:metadata>
        </oai:record>
        <oai:record>
            <oai:header>
                <oai:identifier>oai:edm:bnfa::39bc1687683a6d21103b407a4ae78314</oai:identifier>
                <oai:datestamp>2020-10-16T13:23:28Z</oai:datestamp>
                <oai:setSpec>BNFA</oai:setSpec>
                <oai:setSpec>bnfa</oai:setSpec>
            </oai:header>
            <oai:metadata>
                <efg:efgEntity>
                    <efg:avcreation>
                        <efg:avManifestation>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_avManifestation_MFN2753</efg:identifier>
                            <efg:recordSource>
                                <efg:sourceID>MFN2753</efg:sourceID>
                                <efg:provider schemeID="Institution acronym" id="BNFA">Bulgarian National Film Archive</efg:provider>
                            </efg:recordSource>
                            <efg:title lang="bg">
                                <efg:geographicScope/>
                                <efg:text>ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ</efg:text>
                                <efg:relation>Original title</efg:relation>
                            </efg:title>
                            <efg:thumbnail>https://i.vimeocdn.com/video/974586012.jpg</efg:thumbnail>
                            <efg:format>
                                <efg:gauge>35 mm</efg:gauge>
                                <efg:colour hasColor="false">Black &amp; White</efg:colour>
                                <efg:sound hasSound="true">With sound</efg:sound>
                            </efg:format>
                            <efg:dimension unit="m">177</efg:dimension>
                            <efg:rightsHolder URL="www.bnf.bg">Bulgarian National Film Archive</efg:rightsHolder>
                            <efg:rightsStatus>In Copyright - Educational Use Permitted</efg:rightsStatus>
                            <efg:duration>00:06:10</efg:duration>
                            <efg:item>
                                <efg:identifier scheme="CP_CATEGORY_ID">BNFA_item_s37ITPuSPwnwsGVkOnjowg</efg:identifier>
                                <efg:provider>Bulgarian National Film Archive</efg:provider>
                                <efg:type>Video</efg:type>
                            </efg:item>
                        </efg:avManifestation>
                        <efg:identifier scheme="CP_CATEGORY_ID">BNFA_avCreation_MFN2753</efg:identifier>
                        <efg:recordSource>
                            <efg:sourceID>MFN2753</efg:sourceID>
                            <efg:provider schemeID="Institution acronym" id="BNFA">Bulgarian National Film Archive</efg:provider>
                        </efg:recordSource>
                        <efg:countryOfReference>BG</efg:countryOfReference>
                        <efg:title lang="bg">
                            <efg:geographicScope/>
                            <efg:text>ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ</efg:text>
                            <efg:relation>Original title</efg:relation>
                        </efg:title>
                        <efg:title lang="EN">
                            <efg:geographicScope/>
                            <efg:text>QUEEN OF THE BEACH CONTEST AT THE LUJENE BEACH</efg:text>
                            <efg:relation>Translated title</efg:relation>
                        </efg:title>
                        <efg:description type="Synopsis" lang="bg">
Снимки от Лъдженския плаж (днес Велинград). Музикално оформление с немски шлагери- Преминаване на кандидатките. Скокове от кула. За царица е избрана Мария Русева от Пловдив. За най-красиво дете е избрана Светла Пешева от София.
                        </efg:description>
                        <efg:description type="Synopsis" lang="EN">
The beach at Ladzhene (Velingrad). Reviewing the contestors. High diving. The winner in the ,,Queen of the Beach'' contest is Maria Rousseva from Plovdiv.
                        </efg:description>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>България</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>плаж, басейн</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="bg">
                            <efg:term>конкурс за красота</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Place" lang="EN">
                            <efg:term>Bulgaria</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>beach</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>pool</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Subject" lang="EN">
                            <efg:term>beauty padgent</efg:term>
                        </efg:keywords>
                        <efg:keywords type="Form" lang="EN">
                            <efg:term>Documentary film</efg:term>
                        </efg:keywords>
                        <efg:productionYear>1940</efg:productionYear>
                        <efg:relPerson>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_person_16fd19ae42bb22f73eef6129f7537804</efg:identifier>
                            <efg:name>Totev, Spas</efg:name>
                            <efg:type>Director of photography</efg:type>
                        </efg:relPerson>
                        <efg:relCollection>
                            <efg:identifier scheme="CP_CATEGORY_ID">BNFA_collection_331b3c2fd0aee7d5eaa099234f827848</efg:identifier>
                            <efg:title>Pre-Socialist Cinema</efg:title>
                            <efg:type>is part of</efg:type>
                        </efg:relCollection>
                    </efg:avcreation>
                </efg:efgEntity>
            </oai:metadata>
        </oai:record>
</oai:ListRecords>
</oai:OAI-PMH>

Here's the mapping file:

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix oai: <http://www.openarchives.org/OAI/2.0/>.
@prefix efg: <http://www.europeanfilmgateway.eu/efg/> .
@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix : <http://example.org/rules/> .

#Work
:works a rr:TriplesMap;
  rml:logicalSource [
    rml:source "xxx/BNFA.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity"
  ].

:works rr:subjectMap [
  rr:template "http://example.org/BNFA/{efg:avcreation/efg:identifier}"
].

:works rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [ rr:constant fiaf:WorkVariant ]],
  [
  rr:predicate rdfs:label;
  rr:objectMap [ rml:reference "efg:avcreation/efg:title/efg:text" ]],
  [
  rr:predicate fiaf:hasIdentifier;
  rr:objectMap [ rr:parentTriplesMap :WorkIdentifier ]],
  [
  rr:predicate fiaf:hasTitle;
  rr:objectMap [ rr:termType rr:BlankNode; rr:parentTriplesMap :WorkTitle ]],
  [
  rr:predicate fiaf:hasEvent;
  rr:objectMap [ rr:termType rr:BlankNode; rr:parentTriplesMap :WorkEvent ]],
  [
  rr:predicate fiaf:hasCountry;
  rr:objectMap [ rr:template "https://example.org/EFG/Country/{efg:avcreation/efg:countryOfReference}" ]], #conditional Mapping needed to reference to Geonames etc.
  [
  rr:predicate fiaf:hasManifestation;
  rr:objectMap [ rr:template "http://example.org/BNFA/{efg:avcreation/efg:avManifestation/efg:identifier}" ]
].

#WorkTitle
:WorkTitle a rr:TriplesMap;
  rml:logicalSource [
    rml:source "xxx/BNFA.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity/efg:avcreation/efg:title"
  ].

  :WorkTitle rr:subjectMap [ 
    rr:termType rr:BlankNode; 
    rr:class fiaf:Title; #conditional Mapping needed to make the class dependent on the value in title/type
    ].

  :WorkTitle rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/hasTitleValue> ;
    rr:objectMap [ rml:reference "./efg:text"; rml:languageMap [rml:reference "lower-case(./@lang)"] ]
].

#WorkIdentifier
:WorkIdentifier a rr:TriplesMap;
  rml:logicalSource [
    rml:source "xxx/EFG/BNFA.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity/efg:avcreation"
  ].

  :WorkIdentifier rr:subjectMap [ 
    rr:termType rr:BlankNode; 
    rr:class fiaf:Identifier; 
    ].

  :WorkIdentifier rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/hasIdentifierValue> ;
    rr:objectMap [ rml:reference "efg:recordSource/efg:sourceID" ]
].

#WorkProductionEvent
:WorkEvent a rr:TriplesMap;
  rml:logicalSource [
    rml:source "xxx/BNFA.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity/efg:avcreation"
  ].

  :WorkEvent rr:subjectMap [ 
    rr:termType rr:BlankNode; 
    rr:class fiaf:ProductionEvent; 
    ].

  :WorkEvent rr:predicateObjectMap [
    rr:predicate <https://fiafcore.org/ontology/hasEventDate> ;
    rr:objectMap [ rml:reference "efg:productionYear" ]
],
 #[
   # rr:predicate <https://fiafcore.org/ontology/hasActivity> ;
   # rr:objectMap [ rr:termType rr:BlankNode; rr:parentTriplesMap :WorkActivity ]
#].
[
    rr:predicate <https://fiafcore.org/ontology/hasActivity> ;
    rr:objectMap [ rr:termType rr:BlankNode; ]
].

#WorkActivity
#:WorkActivity a rr:TriplesMap;
  #rml:logicalSource [
    #rml:source "xxx/BNFA.xml";
    #rml:referenceFormulation ql:XPath;
    #rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity"
  #].

 #:WorkActivity rr:subjectMap [ 
  #  rml:template "http://example.org/BNFA/activity/{efg:avcreation/efg:relPerson/efg:type}"; #function to collapse whitspaces needed; actually, this would need to be a blank node
   # rr:class fiaf:Activity; #conditional Mapping needed to make the class dependent on the value in relPerson/type
    #].

 #:WorkActivity rr:predicateObjectMap [
  #  rr:predicate <https://fiafcore.org/ontology/hasAgent> ;
   # rr:objectMap [ rml:template "http://example.org/BNFA/{efg:avcreation/efg:relPerson/efg:identifier}" ]
#].

#Agents
#:Agent a rr:TriplesMap;
 # rml:logicalSource [
  #  rml:source "xxx/BNFA.xml";
   # rml:referenceFormulation ql:XPath;
    #rml:iterator "/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/efg:efgEntity/efg:avcreation/efg:relPerson"
#].

#:Agent rr:subjectMap [ 
 #   rml:template "http://example.org/BNFA/{efg:identifier}"; 
  #  rr:class fiaf:Person
#].

and this is the output it generates:

@prefix : <http://example.org/rules/> .
@prefix efg: <http://www.europeanfilmgateway.eu/efg/> .
@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix oai: <http://www.openarchives.org/OAI/2.0/> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

_:0 a fiaf:ProductionEvent;
  fiaf:hasActivity _:8;
  fiaf:hasEventDate "1940" .

_:1 a fiaf:ProductionEvent;
  fiaf:hasActivity _:9;
  fiaf:hasEventDate "1940" .

_:2 a fiaf:Identifier;
  fiaf:hasIdentifierValue "MFN2747" .

_:3 a fiaf:Identifier;
  fiaf:hasIdentifierValue "MFN2753" .

_:4 a fiaf:Title;
  fiaf:hasTitleValue "О, ДОБРУДЖАНСКИ КРАЙ"@bg .

_:5 a fiaf:Title;
  fiaf:hasTitleValue "THE REGION OF DOBRUDZHA"@en .

_:6 a fiaf:Title;
  fiaf:hasTitleValue "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ"@bg .

_:7 a fiaf:Title;
  fiaf:hasTitleValue "QUEEN OF THE BEACH CONTEST AT THE LUJENE BEACH"@en .

<http://example.org/BNFA/BNFA_avCreation_MFN2747> a fiaf:WorkVariant;
  rdfs:label "THE REGION OF DOBRUDZHA", "О, ДОБРУДЖАНСКИ КРАЙ";
  fiaf:hasCountry <https://example.org/EFG/Country/BG>; #why is hasCountry not a property of the Event? 
  fiaf:hasEvent _:0, _:1;
  fiaf:hasIdentifier _:2, _:3;
  fiaf:hasManifestation <http://example.org/BNFA/BNFA_avManifestation_MFN2747>;
  fiaf:hasTitle _:4, _:5, _:6, _:7 .

<http://example.org/BNFA/BNFA_avCreation_MFN2753> a fiaf:WorkVariant;
  rdfs:label "QUEEN OF THE BEACH CONTEST AT THE LUJENE BEACH", "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ";
  fiaf:hasCountry <https://example.org/EFG/Country/BG>;
  fiaf:hasEvent _:0, _:1;
  fiaf:hasIdentifier _:2, _:3;
  fiaf:hasManifestation <http://example.org/BNFA/BNFA_avManifestation_MFN2753>;
  fiaf:hasTitle _:4, _:5, _:6, _:7 .
paulduchesne commented 1 year ago

Wow, @Rose-EFG this is great work! We can probably keep this in one issue as it is likely that we will hit similar RML issues - funnily enough I am doing RML for a completely unrelated project and have just hit some similar problems (blank node attributed to all parents, vocabulary replacement of values) so I will feed back solutions here.

My discovery of the day was that converting a string of a URL to a URL in the graph can be achieved by adding rr:termType rr:IRI to the end of the rml:reference statement - when I tried to use rr:template directly I kept ending up with percent-formatted URLs which were causing a "not a proper IRI" error.

Rose-EFG commented 1 year ago

Hi Paul, thank you - and looking forward to your ways of tackling the RML issues! The bit about rr:IRI sounds helpful.

In the meantime I was able to get at least the language tags for the WorkTitle/TitleValue (I updated the ttl above directly, because it's just a minor change). I couldn't add them in the same fashion to the rdfs:label directly to the WorkVariant, because there is a problem in iteration. Since I am iterating over the efg:avcreation, the repetition of the efg:title element seems to work, but when I add the rml:languageMap [rml:reference "lower-case(./@lang)"] th emapper wants to add all the language tags to every title... re-iterating to the parent-title-node and then again choosing the @lang did not work, because it selects the parent-node of efg:avcreation as this is the path I indicated in the rml:iterator. I guess, what I would need is a means to adapt iteration inside of the predicateObjectMap...?

Rose-EFG commented 1 year ago

The error message I refer to in nr. 4 that I can't figure out reads: Cannot invoke "be.ugent.rml.functions.SingleRecordFunctionExecutor.execute(be.ugent.rml.records.Record)" because "this.functionExecutor" is null

paulduchesne commented 1 year ago

I should hopefully make some headway with the blank-node-to-everything issue, but a quick clarification to the in-code question fiaf:hasCountry <https://example.org/EFG/Country/BG>; #why is hasCountry not a property of the Event?

I think I put forward early in the modelling discussion that I really believe that Film Country Film Year are inherited from legacy cataloging methods of pushing data "up the tree" and also a general history of using these points as a method of disambiguation between similarly named works. Add to this some confusion around what the "country" and the "year" attribution even mean in this (or another) context and I think there is clear scope for applying this data as correlating to specific event(s). Would now be a good moment to implement this? If a work has an event with a year you can always collapse that statement to just film work and year, and the ontology extension allows you to contextualize what is meant by the "year" data statement (is it production/release/what, and true of where?).

natashafairbairn commented 1 year ago

Film country and film year do have that aspect of legacy cataloguing and disambiguation, but particularly with country there is a more fundamental factor that makes it an important integral part of the Work rather than simply covered by an Event – that of the geographic origin of the film tied in with the nationality and base of the principal production companies (i.e. Agents) involved.

It also ties in with official government and legal designations and identification of nationality of the film. The act of designation could come under an Event, i.e. an official registration, but that Event data would relate more to it simply being a date of registration by a particular agent in a particular place/country.

Similarly, a Production Event may give dates and places of shooting a film, but the latter are simply locations used during filming not necessarily related to the “nationality”/originating country of the Work. The same applies with Publication events, e.g. premieres, film festival showings, etc. which relate to dates and places associated with an event in the life cycle of the manifestations of the Work.

Film country in a Work is referred to as “country of reference” in EN15907, clarified as:

An element used for describing the geographic origin of a cinematographic work. Wherever known and applicable, this should be the country or countries where the production facilities are located. Multinational productions will typically have more than one country of reference, including those with Agents that were not directly involved in the creation of the cinematographic work (e.g. entities that have contributed financial resources). If production information is missing, this element may refer to countries where the cinematographic work was filmed or distributed, or where copies are known to exist in archives.

I don’t see how an Event would quite cover this.

Film year (aka Year of reference) would theoretically be a bit easier to correlate with an Event, as per the definition of this in EN15907.

A year associated with an event in the life cycle of the cinematographic work, typically associated with its creation, availability or registration (for example for copyright purposes). A typical use of this element is chronological ordering of lists of cinematographic works. The year of reference is expressed as a four digit value, optionally followed by a dash (Unicode value 002Dhex) and another year to denote a span of years.

I’m not quite clear how it would work though if you had no year of reference on the Work itself but pulled it through from an Event, because you would potentially have at least three different Events and dates, e.g. a Production Event with production date(s) 1959-1960; a Copyright Event with date 1960; and a slightly vague “availability” (relating to first release) with date 1961. “Availability” would usually correspond with some sort of Publication Event, but in most cases there is no Publication Event linked to a Work because the date is found in a Release Manifestation and because Publication Events are meant to be associated with Manifestations rather than Works, e.g. Publication Events such as release in particular cinemas in a particular place or distribution by a particular company from a particular date would be associated with a Release Manifestation rather than with the Work (see D.9 and D.10 of FIAF Manual re. Publication Events which makes it clear expectation is that Publication types are not pertinent for Works/Variants, only Manifestations).

Natasha

Rose-EFG commented 1 year ago

Hi @paulduchesne, I tried out what happens, if I transform the BNFA example to RDF/XML first and then convert to TTL.

The XSLT is here: https://github.com/FIAF/modelling-workshops/blob/main/examples/bnfa/BNFA-RDF-XML.xsl

And it outputs this converted ttl. Is this an allowed way to express blank nodes? I am not that familiar with the Turtle syntax yet... (also the fiaf:hasCountry object looks incorrect. I'll have to check this in the RDF/XML.)

@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.org/BNFA/BNFA_avCreation_MFN2747> a fiaf:WorkVariant ;
    rdfs:label "О, ДОБРУДЖАНСКИ КРАЙ"@bg ;
    fiaf:hasCountry <fiaf:Bulgaria> ;
    fiaf:hasIdentifier [ a fiaf:Identifier ;
            fiaf:hasIdentifierValue "MFN2747" ] ;
    fiaf:hasTitle [ a fiaf:PreferredTitle ;
            fiaf:hasTitleValue "О, ДОБРУДЖАНСКИ КРАЙ"@bg ],
        [ a fiaf:TranslatedTitle ;
            fiaf:hasTitleValue "THE REGION OF DOBRUDZHA"@en ] .

<http://example.org/BNFA/BNFA_avCreation_MFN2753> a fiaf:WorkVariant ;
    rdfs:label "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ"@bg ;
    fiaf:hasCountry <fiaf:Bulgaria> ;
    fiaf:hasIdentifier [ a fiaf:Identifier ;
            fiaf:hasIdentifierValue "MFN2753" ] ;
    fiaf:hasTitle [ a fiaf:PreferredTitle ;
            fiaf:hasTitleValue "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ"@bg ],
        [ a fiaf:TranslatedTitle ;
            fiaf:hasTitleValue "QUEEN OF THE BEACH CONTEST AT THE LUJENE BEACH"@en ] .
paulduchesne commented 1 year ago

That is great news @Rose-EFG, and good to know this is a viable alternate transformation pathway. That [ ] alternate blank node syntax I believe is entirely correct, although I do wonder what happens if you have multiple subjects pointing to the same blank node.

You are right that I don't think the < > for hasCountry is correct, as IRI syntax should be either full path wrapped in triangle brackets (eg the work IRIs here), or the prefixed form (eg fiaf:hasTitle).

Rose-EFG commented 1 year ago

Thanks for the quick feedback! I updated the XSLT, now the Country looks better

@prefix fiaf: <https://fiafcore.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.org/BNFA/BNFA_avCreation_MFN2747> a fiaf:WorkVariant ;
    rdfs:label "О, ДОБРУДЖАНСКИ КРАЙ"@bg ;
    fiaf:hasCountry fiaf:Bulgaria ;
    fiaf:hasIdentifier [ a fiaf:Identifier ;
            fiaf:hasIdentifierValue "MFN2747" ] ;
    fiaf:hasTitle [ a fiaf:TranslatedTitle ;
            fiaf:hasTitleValue "THE REGION OF DOBRUDZHA"@en ],
        [ a fiaf:PreferredTitle ;
            fiaf:hasTitleValue "О, ДОБРУДЖАНСКИ КРАЙ"@bg ] .

<http://example.org/BNFA/BNFA_avCreation_MFN2753> a fiaf:WorkVariant ;
    rdfs:label "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ"@bg ;
    fiaf:hasCountry fiaf:Bulgaria ;
    fiaf:hasIdentifier [ a fiaf:Identifier ;
            fiaf:hasIdentifierValue "MFN2753" ] ;
    fiaf:hasTitle [ a fiaf:PreferredTitle ;
            fiaf:hasTitleValue "ИЗБОР НА ЦАРИЦА НА ЛЪЖЕНСКИЯ ПЛАЖ"@bg ],
        [ a fiaf:TranslatedTitle ;
            fiaf:hasTitleValue "QUEEN OF THE BEACH CONTEST AT THE LUJENE BEACH"@en ] .

You mentioned in the last workshop session, that the leading "01. " numbering for last names in the Bundesarchiv data are still present in the resulting ttl. You could throw them out during the RML transformation by adding the XPATH function replace like this:

[ rr:predicate fiaf:LastName;
rr:objectMap [ rml:reference "replace(@Nachname,'^\d+.\s,'')" ] ].
paulduchesne commented 1 year ago

I thought I would quickly test some command-line processors for your XSLT process @Rose-EFG .

xsltproc does not work as it only supports up to 1.1, however I got good results from Saxon-HE via the following command:

java -jar saxon-he-12.0.jar -s:bnfa.xml -xsl:BNFA-RDF-XML.xsl -o:render.xml

I see you also have no issues with language tags using this method!

Given that it seems significantly more intuitive to enact a XML -> RDF/XML transform I am tempted to give up on learning any more RML and instead follow your excellent groundwork using the BA example as a learning opportunity.

What do you think about that direction? Given it all resolves to turtle in the end it shouldn't really matter what transformation pipeline we use, as long as we are happy with the results.

paulduchesne commented 1 year ago
<xsl:variable name="map0"><!-- This could need a better solution, because this mapping would become very long for all the ISO codes -->
<map value="https://fiafcore.org/ontology/Bulgaria">BG</map>       
</xsl:variable>

I was going to suggest that "vocab replacement" could best take place at the harmonisation stage. For instance it would be even more difficult to map agents from source IRIs to a shared identifier, so instead the transformation stage would terminate in imaginary IRIs (eg https://www.bundesarchiv.de/agent/ad399ad0-dcae-4a22-861e-e0bbbe233840) which are then transformed to shared FIAF resources (eg https://fiafcore.org/resource/ffcb9354-7636-4cf9-9e8f-f20bb067b205 ) via essentially a small authority file.

In the example of country if you did render BG as https://example.org/EFG/Country/BG (from your RML example) the harmonisation stage could then do the conversion to https://fiafcore.org/ontology/Bulgaria, although would have to work out what to be done where there is no equivalent to map to (not sure when exactly this would occur!).

Rose-EFG commented 1 year ago

Hi @paulduchesne , thanks for your feedback! I progressed on the RDF-XML and have now - as far as I can see - added all the classes and properties there are to map for the BNFA example. As you suggested, I created some dummy URLs for countries and forms. The fiaf:Format subclasses even worked with concatenating. :)

I added a new xml with some additional elements for testing. If you'd say the Turtle output is fine, I could transform all 97 records. https://github.com/FIAF/modelling-workshops/blob/main/examples/bnfa/BNFA2.xml https://github.com/FIAF/modelling-workshops/blob/main/examples/bnfa/20230515_BNFA-RDF-XML.xsl https://github.com/FIAF/modelling-workshops/blob/main/examples/bnfa/20230515_BNFA_fulltranformation.ttl

There was not much to tranform for the fiaf:Item because of the data we retain there in the EFG schema (links to vimeo, provider information etc.), but I commented in the xsl directly.

The mapping to the fiaf:Extent takes a bit getting used to, i.e. having the unit of extent as a class instead of a part of the value (or an attribute as in the source XML).

Rose-EFG commented 1 year ago

I will try to add some Wikidata links to the agents, if possible, as well.

paulduchesne commented 1 year ago

The mapping to the fiaf:Extent takes a bit getting used to, i.e. having the unit of extent as a class instead of a part of the value (or an attribute as in the source XML).

Yes, I suppose a more common representation in CMS would have the value and then the measures as a qualifier (eg "2000" "ft") but this seemed the more appropriate RDF way of modelling.

Thank you again for your fantastic groundwork on this, I will be learning a lot from what you have done over the next weeks as I try and build up my own knowledge of XSLT.

Rose-EFG commented 1 year ago

Yes, I suppose a more common representation in CMS would have the value and then the measures as a qualifier (eg "2000" "ft") but this seemed the more appropriate RDF way of modelling.

Exactly, that's how it is in EFG and in our DFF Axiell collections data modelling. Could the unit also be modelled as a property of the Extent class? As in: has Extent --> Extent --> has ExtentValue+hasExtentUnit? But I see your point that Minutes, Metres etc. are a subclass of Extent.

Thank you again for your fantastic groundwork on this, I will be learning a lot from what you have done over the next weeks as I try and build up my own knowledge of XSLT.

Thanks! I already see some possibilities to shorten a few if-conditions the xsl, so I'll adapt it again in the coming days.

paulduchesne commented 1 year ago

Could the unit also be modelled as a property of the Extent class?

This would be the other way of doing it, and in fact how Wikidata works (properties with a "quantity" datatype, with a qualifying unit of measures). I do wonder if there needs to be a taxonomy to organise the different types of "measures" though, eg physical length, digital file size.