FIAF / modelling-workshops

Modelling Workshops
0 stars 1 forks source link

Work/Variant #1

Closed paulduchesne closed 1 year ago

paulduchesne commented 1 year ago

Discussion around the modelling of Work/Variant, using the FIAF Cataloguing Manual as primarily source.

Work/Variant Elements

Introduction

Properties which terminate in blocks of text (notes, content description, histories) have been removed, as while interesting data could be derived from them which conform to agent or event information, they are best not retained in their original state. In this iteration, language has been pushed to a property of manifestation. year of reference has also been removed as a direct property in lieu of supplying a blank event with an attached date.

1.2.1 Work/Variant Description Type

These are expressed as subclasses of the work/variant class, and are used to apply a type of work/variant, e.g. https://www.filmovyprehled.cz/cs/film/396690/sedmikrasky > rdf:type > fiaf:Monographic.

<https://fiafcore.org/ontology/Analytic> a owl:Class ;
    rdfs:label "Analytic"@en ;
    dc:source "FIAF Cataloguing Manual D.1"^^xsd:string ;
    rdfs:subClassOf fiaf:WorkVariant .

<https://fiafcore.org/ontology/Collection> a owl:Class ;
    rdfs:label "Collection"@en ;
    dc:source "FIAF Cataloguing Manual D.1"^^xsd:string ;
    rdfs:subClassOf fiaf:WorkVariant .

<https://fiafcore.org/ontology/Monographic> a owl:Class ;
    rdfs:label "Monographic"@en ;
    dc:source "FIAF Cataloguing Manual D.1"^^xsd:string ;
    rdfs:subClassOf fiaf:WorkVariant .

<https://fiafcore.org/ontology/Serial> a owl:Class ;
    rdfs:label "Serial"@en ;
    dc:source "FIAF Cataloguing Manual D.1"^^xsd:string ;
    rdfs:subClassOf fiaf:WorkVariant .

1.2.2 Variant Type

If the work/variant is a variant (inferred by being the target of a has variant statement), then the flavour of variant can be attributed from the D.2 controlled vocabulary, e.g. Picnic at Hanging Rock (Director's Cut) > fiaf:hasVariantType > fiaf:Augmented

<https://fiafcore.org/ontology/hasVariantType> a owl:ObjectProperty ;
    rdfs:label "Has Variant Type"@en ;
    dc:source "FIAF Cataloguing Manual 1.2.2"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:VariantType .

1.3.1 Identifier
1.3.1.1 Identifier Type

Working with RDF/OWL involves slightly different modelling considerations than working with a traditional CMS. A regular implementation of identifier would involve two fields: 'Identifier type':'Wikidata' and 'Identifier code':'Q1421747', or more simply 'Wikidata ID':'Q1421747'.

In RDF, type is an inherent part of declaring the existence of an entity, so you would do something like http://collections.cinematheque.qc.ca/recherche/oeuvres/fiche/2082 > hasIdentifier > blank node to represent the Wikidata identifier for this work (of type "Wikidata Identifier") > hasIdentifierText > Q1421747. A blank node is a placeholder to show that there is a concrete entity which we are talking about (in this case "the wikidata identifier which corresponds with this work"), but we would not give this entity itself its own identity (or url).

A shorter option would be to drop the blank node, so http://collections.cinematheque.qc.ca/recherche/oeuvres/fiche/2082 > hasWikidataIdentifier > Q1421747, but I prefer the extended version as it allows non-destructive ontology editing in the future, easpecially if you wished to extend the model (e.g. "true as of time").

Below is the connection between work/variant and identifier classes.

<https://fiafcore.org/ontology/hasIdentifier> a owl:ObjectProperty ;
    rdfs:label "Has Identifier"@en ;
    dc:source "FIAF Cataloguing Manual 1.3.1"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Identifier .

1.3.2 Title
1.3.2.1 Title Type

Similar to identifier, title type would be expressed first on a blank node (with the possibility of further specific properties) followed by title-as-text, e.g. https://www.filmovyprehled.cz/cs/film/396690/sedmikrasky > hasTitle > blank node for this title of this work (of type "Title Proper") > hasTitleText > Sedmikrásky. Here we are just expressing the connection between the work/variant and title component of the statement.

<https://fiafcore.org/ontology/hasTitle> a owl:ObjectProperty ;
    rdfs:label "Has Title"@en ;
    dc:source "FIAF Cataloguing Manual 1.3.2"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Title .

1.4.1 Agent
1.4.1.1 Agent Activity

In a deviation from the manual, which seems to allow direct work/variant to agent relationships, here we require an intermediate activity entity, even if the nature of that activity is unknown.

This activity blank node could also link information explicit to that contribution to that work (e.g. salary, screentime, character name, how credited).

<https://fiafcore.org/ontology/hasActivity> a owl:ObjectProperty ;
    rdfs:label "Has Activity"@en ;
    dc:source "FIAF Cataloguing Manual 1.4.1.1"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Activity .

1.3.3 Country of Reference

Maybe one day country could be inferred based on location data attributed to events or agents (which would also allow for a plurality of interpretations of how to attribute film nationality to coexist), but it is historically a filmographic staple and extremely useful for disambiguation.

<https://fiafcore.org/ontology/hasCountry> a owl:ObjectProperty ;
    rdfs:label "Has Country"@en ;
    dc:source "FIAF Cataloguing Manual 1.3.3"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Country .

1.4.3 Subject/Genre/Form Terms

Form is a straightforward applucation from controlled vocabulary, e.g. https://www.filmovyprehled.cz/cs/film/396690/sedmikrasky > hasForm > Feature

<https://fiafcore.org/ontology/hasForm> a owl:ObjectProperty ;
    rdfs:label "Has Form"@en ;
    dc:source "FIAF Cataloguing Manual 1.4.3"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Form .

Same for genre, e.g. https://www.filmovyprehled.cz/cs/film/396690/sedmikrasky > hasGenre > Allegory

<https://fiafcore.org/ontology/hasGenre> a owl:ObjectProperty ;
    rdfs:label "Has Genre"@en ;
    dc:source "FIAF Cataloguing Manual 1.4.3"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Genre .

Subject is more complex, following the pattern of Form/Genre (treated as equivalent in the cataloguing manual) we would expect work/variant > hasSubject > some subject. But is the subject an entity or text? Entity would be preferable to manage polysemy, but what resource contains entities for all possible subjects (e.g. Catalogue examples: "Street-railroads", "Horse-drawn vehicles", "Automobiles"). Maybe here we could freely link out to external concepts, e.g. https://www.filmovyprehled.cz/cs/film/396690/sedmikrasky > hasSubject > https://www.wikidata.org/wiki/Q870.

<https://fiafcore.org/ontology/hasSubject> a owl:ObjectProperty ;
    rdfs:label "Has Subject"@en ;
    dc:source "FIAF Cataloguing Manual 1.4.3"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Subject .

1.4.2 Events

The modelling for events will be quite involved, here we are just declaring that a work/variant has an event.

<https://fiafcore.org/ontology/hasEvent> a owl:ObjectProperty ;
    rdfs:label "Has Event"@en ;
    dc:source "FIAF Cataloguing Manual 1.4.2"^^xsd:string ;
    rdfs:domain fiaf:WorkVariant ;
    rdfs:range fiaf:Event .

Other Properties

hasManifestation and hasWorkVariant link the work/variant to child entities.

stephenmcconnachie commented 1 year ago
  1. Are there extant vocabularies for this entity

One of the most discussed vocabularies I think is the types of roles of contributors to production of a work - cast and crew activities. The Filmographic Terms Glossary includes a lot of those, with translations into various languages: https://www.fiafnet.org/pages/E-Resources/Glossary.html#:~:text=The%20FIAF%20Glossary%20of%20Filmographic,screen%20and%20in%20documentation%20sources.

We also have many more, in our collections management system thesaurus, grouped together into depts and also into Pre-production, Production and Post-production roles. It might be a rich area to model.

We also have a well structured genre and subject thesaurus, with hierarchical relationships (broader and narrower terms), as well as some preferred term (use instead). We could share those, if useful for modelling.

stephenmcconnachie commented 1 year ago
  1. Questions about how this entity is defined in EN 15907 or the FIAF Cataloguing Manual

Inevitably the issue of Variant will come up - the manual carefully allows both Variant - Work approach and no Variant approach. So it may be worth defining whether we need to address Variant complexities here or delay / scope out

paulduchesne commented 1 year ago

Inevitably the issue of Variant will come up - the manual carefully allows both Variant - Work approach and no Variant approach. So it may be worth defining whether we need to address Variant complexities here or delay / scope out

I was thinking that next week we could tackle Variants - but one of the complexities of splitting up these meetings by "entity" is that of course they are all interconnected. Maybe we would draw the line that for this first discussion we are discussing "variants" only so far as how they connect to "works", but not the attributes of variants themselves?

From this perspective we are also possibly lucky that it sounds like there is parity between 15907 and the manual: work -> hasVariant (one or more) -> variant (which will link to manifestations) OR Work -> hasManifestation (one or more) -> manifestation.

Two points which strike me about this, in the above definition you are clearly restricted to either 1..n Variants OR 1..n Manifestations which excludes the follow two possibilities:

  1. A work which has neither manifestations nor variants - is this possible? Now I think of it maybe not, as a work cannot exist which has never been manifested in some form.
  2. A work cannot link directly to manifestations AND variants. This surprises me, I would have thought it possible to have both (eg work has links to both one-or-more variants AND one-or-more direct manifestations) as so:

Picnic at Hanging Rock (work) -> hasVariant -> Director's Cut (variant)

Picnic at Hanging Rock (work) -> hasManifestation -> original theatrical release 1975 (manifestation)

Director's Cut (variant) -> hasManifestation -> director's cut theatrical release 1999 (manifestation)

paulduchesne commented 1 year ago

Actually I think I have misread this, looking back at 15907 the restriction is that IF there is no Variant, there must be 1..n Manifestations and the other way around, so not a restriction on having 1..n of each at the same time.

stephenmcconnachie commented 1 year ago

Actually I think I have misread this, looking back at 15907 the restriction is that IF there is no Variant, there must be 1..n Manifestations and the other way around, so not a restriction on having 1..n of each at the same time.

You beat me to it...

natashafairbairn commented 1 year ago

"1. A work which has neither manifestations nor variants - is this possible? Now I think of it maybe not, as a work cannot exist which has never been manifested in some form"

Actually this may theoretically be feasible - we have Work records for some Unrealised film projects in our database. Being unrealised there can be no moving image Manifestation, so these are single Works with no Manifestations or Items, but sometimes with horizontal associated relationships with credits (i.e. proposed cast/director/scriptwriter), or with relevant unpublished scripts, paper special collections, periodical articles/books held in our other collections. Similarly, we may create a Series Work for a TV series - with some general credits, transmission dates information, synopsis, etc. but do not have the time and resources to add all the episodes and their Manifestations where we haven't acquired them. However, we do need a Series Work record in order to link to relevant indexed periodical references or book records in our Library collections.

circesanchez commented 1 year ago

"1. A work which has neither manifestations nor variants - is this possible? Now I think of it maybe not, as a work cannot exist which has never been manifested in some form"

Actually this may theoretically be feasible - we have Work records for some Unrealised film projects in our database. Being unrealised there can be no moving image Manifestation, so these are single Works with no Manifestations or Items, but sometimes with horizontal associated relationships with credits (i.e. proposed cast/director/scriptwriter), or with relevant unpublished scripts, paper special collections, periodical articles/books held in our other collections. Similarly, we may create a Series Work for a TV series - with some general credits, transmission dates information, synopsis, etc. but do not have the time and resources to add all the episodes and their Manifestations where we haven't acquired them. However, we do need a Series Work record in order to link to relevant indexed periodical references or book records in our Library collections.

--- I think in the case of a "Work" which has not "Variants", we could count many old unpublished Works, between which we could have the firts Short Films know and Newsreels without subtitles or dubling, these are just a couple of examples.

torbjornbp commented 1 year ago

"1. A work which has neither manifestations nor variants - is this possible? Now I think of it maybe not, as a work cannot exist which has never been manifested in some form"

Actually this may theoretically be feasible - we have Work records for some Unrealised film projects in our database. Being unrealised there can be no moving image Manifestation, so these are single Works with no Manifestations or Items, but sometimes with horizontal associated relationships with credits (i.e. proposed cast/director/scriptwriter), or with relevant unpublished scripts, paper special collections, periodical articles/books held in our other collections. Similarly, we may create a Series Work for a TV series - with some general credits, transmission dates information, synopsis, etc. but do not have the time and resources to add all the episodes and their Manifestations where we haven't acquired them. However, we do need a Series Work record in order to link to relevant indexed periodical references or book records in our Library collections.

Seconding this. I can think of a lot of examples were Serial works would have no manifestations. I guess a question then is what kind of relationship is between the serial and monographic works?

paulduchesne commented 1 year ago

Seconding this. I can think of a lot of examples were Serial works would have no manifestations.

This could be the first item for the "deviations from EN15907" list. There seems to be consensus that works should legitimately exist with neither variants nor manifestations, something which the standard does not allow for.

I guess a question then is what kind of relationship is between the serial and monographic works?

Glad you brought this up, as this is a question I have long had myself - the famous EN15907 diagram does have a closed loop for work, meaning a work can have a work (can have a work...), but what is the relationship? I had naively thought there could be a "has work" to connect two linked works, but this is not present in the standard.

Cataloguing Manual 1.4.4 (p.46) explicitly uses "other relationship" in this context (using "Fantômas contre Fantômas" as an example of a "part of" "Fantômas"), however I think it is interesting to note that "other relationship" is non-hierarchical.

ladislav-nfa commented 1 year ago

As to Language.

In the section 6.9. of the standard, there is cardinality: zero or more for Variant and Manifestation. However, in the section 4.1 (Cinematographic Work), there is Language listed as one of the elements for Work, with cardinality zero or more.

So there is an inconsistency in the standard. Is or is not language the element of the Work according to the standard?

stephenmcconnachie commented 1 year ago

Good discussion! I think it’s fairly obvious and straightforward to every user of EN 15907 that a Serial Work by definition will have actual or implied Work children that are hierarchical not horizontal relationships. So if the standard as written is lacking clarity in codifying that parent-child relationship, I think we can improve on it…. :D

torbjornbp commented 1 year ago

Good discussion! I think it’s fairly obvious and straightforward to every user of EN 15907 that a Serial Work by definition will have actual or implied Work children that are hierarchical not horizontal relationships. So if the standard as written is lacking clarity in codifying that parent-child relationship, I think we can improve on it…. :D

I'm unsure if it is that clear cut. It depends on whether the serial variant can have its own "has/is part" relationships to works/variants.

It came up in a modelling discussion here not so long ago, and modeling it as a hierarchical relation had some odd consequences. I'll see if I can find the example and see if it is relevant...

paulduchesne commented 1 year ago

If I understood correctly, there was interest last week in moving to using the Cataloguing Manual as primary source owing to the fact that it is a fully open resource, it is designed for pragmatic archival use and deviation from the EN standard is presumed intentional.

This is not to discourage any discussion of 15907 here, and how it should be properly applied, but I will move to using the Appendix K from the manual as the basis for transfer into OWL. In some ways this is more complex as it is more open to interpretation and does not have some of the same explicit restrictions as the standard (eg cardinality). Maybe this provides us also another goal for these workshops - a formal implementation of the cataloguing manual in OWL, which should hopefully inspire discussion around ambiguities and interpretations along the way.

Modelling in RDF/OWL does factor in some different considerations from how you would work if you were modelling for a traditional CMS, for instance for implementing identifier in a CMS you would normally want a section with two fields: 'Identifier type':'Wikidata' and 'identifier code/id':'Q1421747', or more simply 'wikidata id':'Q1421747'.

In RDF, your "type" is an inherent part of declaring the existence of an identifier, so you would be more likely to do something like http://collections.cinematheque.qc.ca/recherche/oeuvres/fiche/2082 > hasIdentifier > blank node to represent the Wikidata identifier for this work (type: WikidataIdentifier) > hasIdentifierText > Q1421747.

A shorter option would be to drop the blank node, so http://collections.cinematheque.qc.ca/recherche/oeuvres/fiche/2082 > hasWikidataIdentifier > Q1421747, but I personally much prefer the extended model as it allows you to non-destructively allow further data statements against the "entity which is Cinémathèque-québécoise-wikidata-id-for-picnic-at-hanging-rock" (eg "true as of time") in the future if you wish to extend the ontology.

paulduchesne commented 1 year ago

In the section 6.9. of the standard, there is cardinality: zero or more for Variant and Manifestation. However, in the section 4.1 (Cinematographic Work), there is Language listed as one of the elements for Work, with cardinality zero or more. So there is an inconsistency in the standard. Is or is not language the element of the Work according to the standard?

I had assumed this to be an error - I would have considered language crucial at Work level to indicate a primary language (please correct me if I have missed another place where this is declared), especially if there are "Dubbed" Variants which would be distinguished by a different spoken language to the work.

An interesting edge case which occurred to me in this space, I understand that a lot of the films of Dino De Laurentiis from the 50s and 60s were shot, dubbed and released in separate English and Italian prints simultaneously (Ulysses [1954], Barabbas [1961]). So if both languages are "primary" for release, how do you indicate this, and distinguish that case from a work which features a single soundtrack contains both English and Italian, without having to dig into item-level language data?

torbjornbp commented 1 year ago

I’ve tried to rephrase my initial question in the workshop regarding works/variants and the “representative expression” concept:

In its essence its a question of whether the work and variant are entities with the same attributes, or whether some attributes would only ever be found on either the work or variant.

The reason I’m asking is the current hierarchical modelling of the work-variant relationship. What you say about the parent in a parent/child is supposed to also be true of its child records. In a model with an optional variant modelled as a child record to a work, you quickly run into problems.    Unless you somehow define which attributes at the work level is work-entity metadata and describes both the original expression and its variant, and what is expression-entity metadata only describing the original expression, everything you say at the work level is paradoxically implied to also be true of its variant.    If I for example were to add an attribute at the parent work level, but remove this same attribute at the child variant level, and then do a search for every manfestation-entity in the collection that belongs to a work or variant with this attribute, I could also get hits on the variant manifestations...   This might be heresy, but in a model with an optional variant I’m tempted to argue that the variant could benefit from being modelled as more of a parallel entity to the work (using a non-hierarchical relationship).

stephenmcconnachie commented 1 year ago

For what it’s worth, at the BFI National Archive we have never used the Variant and don’t really see a huge benefit in it as an entity. We typically treat the major version difference as a separate Work, related to the Work horizontally with a carefully described relationship.

I understand some archives do use the Variant, including CNC and possibly Swedish Film Institute, but I believe it creates as many problems as it solves, and doesn’t add a huge benefit that can’t be achieved via a well structured relationship.

But we’re probably not typical in this.

@Paul I do agree that modelling on the manual is a better approach than modelling on the standard…

paulduchesne commented 1 year ago

This actually plays really well into something I wanted raise, seeing as how Work/Variant seems to be a hot topic. A crucial component of RDF modelling is where entities are attributed a type, ie that they belong to some class of "thing".

Simple example of this would be that Picnic At Hanging Rock > is a (aka rdf:type) > work. We also have a controlled vocabulary of "subclasses" of work under manual D.1 (monographic, analytic, etc).

When we come to variant we have a problem though, there are two different vocabularies for "variant type", 1.2.1 Work/Variant Description Type AND 1.2.2 Variant Type. A further problem, if we have monographic as a subclass of work, and monographic as a subclass of variant, how do we know whether we are talking about a 'monographic-work' or a 'monographic-variant'?

I see three possible pathways to resolve this, which also should play into the conversation above:

option 1 Allow the overlap, both "Picnic At Hanging Rock" (the work) and "Picnic At Hanging Rock [Director's Cut]" (the variant) are monographic, which feels similar to @torbjornbp's mention that "everything is an expression". At this point you would possibly also follow Appendix K of the Manual literally, and merge work and variant into work/variant and infer variation by the dedicated variant type or has variant data.

`Picnic at Hanging Rock` > `rdf:type` > `monographic (subclass of work/variant)` AND `has variant` > `Picnic at Hanging Rock Directors Cut`
`Picnic at Hanging Rock Directors Cut` > `rdf:type` > `monographic (subclass of work/variant)`  AND `has variant type` > `augmented`   

options 2 Keep work and variant vocabularies distinct by adding the class name to each subclass, eg monographicwork or monographicvariant. This looks ugly to me and implies that these are different attributes - I think they are clearly actually the same thing. Worth noting this is what we were doing in the early drafts of FIAFcore.

option 3 ignore the "description type" for variant, and only use the dedicated "variant type", maybe assuming that a variant would inherit the "description type" of the work (can anyone think of instances where this is is not true, eg the work is monographic but the variant is analytic).

`Picnic at Hanging Rock` > `rdf:type` > `monographic (subclass of work)` 
`Picnic at Hanging Rock Directors Cut` > `rdf:type` > `augmented (subclass of variant)` 

Some of this could be informed by what kinds of queries you wish to run, eg "show me all variants" - this is easy for 2 and 3, but not 1 (where you need to infer via has variant or variant type) , however I am kind of leaning towards that option as a possible pathway forward - especially seeing as how

  1. as @stephenmcconnachie has pointed out, variant is rarely currently used.
  2. work and variant are identical regarding properties (aside from "Variant Type").
  3. by removing this hierarchy this also removes the assumed inheritance of values which @torbjornbp is describing.
ladislav-nfa commented 1 year ago

In the section 6.9. of the standard, there is cardinality: zero or more for Variant and Manifestation. However, in the section 4.1 (Cinematographic Work), there is Language listed as one of the elements for Work, with cardinality zero or more. So there is an inconsistency in the standard. Is or is not language the element of the Work according to the standard?

I had assumed this to be an error - I would have considered language crucial at Work level to indicate a primary language (please correct me if I have missed another place where this is declared), especially if there are "Dubbed" Variants which would be distinguished by a different spoken language to the work.

An interesting edge case which occurred to me in this space, I understand that a lot of the films of Dino De Laurentiis from the 50s and 60s were shot, dubbed and released in separate English and Italian prints simultaneously (Ulysses [1954], Barabbas [1961]). So if both languages are "primary" for release, how do you indicate this, and distinguish that case from a work which features a single soundtrack contains both English and Italian, without having to dig into item-level language data?

According to the manual, if this is the case when "different language versions shot at the same time, released simultaneously, sometimes with different casts" (see 1.1.1 Boundaries between Works), this would constitute new Works.

Another thing - which I prefer to discuss in Variant discussion topic - is when there is co-production of countries, for instance, France - Italy or Czechoslovakia - Western Germany and there aretwo separate original language versions usually made in Post-Synchronization during original production (French, Italian or Czech, German). It would be wrong to say that language of the former is French and Italian, and of the latter Czech and German.

So there are two original "works" (with different languages) at the same time. One solution - although not quite nice - could be to pick up one as the Work (language of the Work) and another one as the Variant, type something like "co-production" variant (to distinguish language variants being made later by another companies - by distributor for an another country, for instance - than is production company of the Work). Another way is to deal with it on the Manifestation level (the solution I think is problematic).

stephenmcconnachie commented 1 year ago

Could I suggest something for the group to consider carefully – though the exceptions are very interesting, they are few. I would much prefer to spend energy on the common cases, than the few exceptions.

We have over 1 million moving image works in our database where there is categorically no Variant scope, and probably 200 where a Variant is a useful entity potentially. I love the 1 million more than the 200…

ladislav-nfa commented 1 year ago

In Národní filmový archiv (Czech Republic) we have about 55 000 works of which at least 20% are or have variants (it could be much more, up to 30-40% I would guess - we will find out during migration). We do record variant properties on the level of Item (dialogue language, language of subtitles or intertitles if any, language of opening credits, language of subtitles for opening credits, language of voice-over for opening credits, sonorized version etc.) so there are already enough data in our old system to establish different variants which will help us to separate original works (and their items) from variants (and their items) which is extremely useful for our purposes.

torbjornbp commented 1 year ago

@stephenmcconnachie CNC uses the variant, but the SFI does not. CNC is the only moving image archive I’ve talked to so far that have implemented it. Here at the National Library we were quite keen to implement it in our migration to Axiell Collections. However, we will currently not implement it in the first test migration due to issues related to what I mentioned earlier and an expected increase in complexity for system integrations...

When it comes to spending energy on the most common issues I wholeheartedly agree. However, I can’t see how we can avoid discussing the variant in general, such as the issues @paulduchesne highlighted earlier for example.

With that in mind, I’m quite fond of Paul’s option 1! By defining what is strictly “work” attributes and what is shared “work/variant” attributes, you formalise inheritance rules to some degree. Eg. “a work attribute is also true for a work's variant, while a work's work/variant attribute is only true for the original expression, while a variant's work/variant attribute is only true for the variant”.

Looking at appendix K, it seems you could actually say that all of it is “work/variant” attributes, which leaves the work and variant entities very parallel.

ladislav-nfa commented 1 year ago

I guess the relationship between work and variant (in the standard and cataloging manual) is complicated as it actually involves two types of relationships, depending on if you see the work as work itself (content) or as both the work and the process of realization (production activities, circumstances and "contents").

The first relationship is the relationship between the original work (represented by the original manifestation, the manifestation released in the country of origin) and the “variant work” (a derivative of the original work). In this case, modeling the variant as a parallel entity, as suggested here, works well for the types of variant like subtitled, dubbed or abbreviated variant. All of these variant types can be seen as derivative of the original work (that is, adding subtitles or replacing the original soundtrack with the soundtrack containing dialogues dubbed in different languages or cutting off some scenes). This type of relationship can be verified by simply comparing the items that represent the original work and variant work. The work and the variant are to some extent equivalent in this case, although the variant is chronologically more recent.

However, the augmented variant, for example, is a different case. The added scenes cannot be seen as derivative of the original work (they are not contained in the original manifestation). Here, the second type of relationship between work and variant comes into play, that is, the work including the process of realization. The augmented variant is not based on the original work (in the sense of the original manifestation), but on production materials created during the realization of the work that were not included in the final cut of the original manifestation of the original work. In the latter case, the work and the variant are not equivalent entities, since the work is a broader ('richer') concept than the variant (the work includes both the original work and the process of realization of the work, i.e. everything that was created during the shooting, post-production etc.).

There may be another solution - work always having a “variant”, in the case of an original work this “variant” would be called something like "original version". The term “variant” would, of course, be somewhat misleading in this context. This partly takes us back to the original FRBR model (EN 15907 cuts off the expression level). Also, from this perspective, works that are not variants but new works (e.g., dailies) would perhaps also fall under the umbrella of a work, as its “variants”. But the problem with FRBR is that it works well for literary texts, but not so much for films, as FRBR somewhat blurs the fundamental distinction between the original work and its later derivatives, the aspect of provenance that is crucial not only to film.

However, I realize that this reasoning is somewhat complicated and perhaps it is just my misinterpretation of the standards. I have no ultimate solution in mind.

ladislav-nfa commented 1 year ago

This actually plays really well into something I wanted raise, seeing as how Work/Variant seems to be a hot topic. A crucial component of RDF modelling is where entities are attributed a type, ie that they belong to some class of "thing".

Simple example of this would be that Picnic At Hanging Rock > is a (aka rdf:type) > work. We also have a controlled vocabulary of "subclasses" of work under manual D.1 (monographic, analytic, etc).

When we come to variant we have a problem though, there are two different vocabularies for "variant type", 1.2.1 Work/Variant Description Type AND 1.2.2 Variant Type. A further problem, if we have monographic as a subclass of work, and monographic as a subclass of variant, how do we know whether we are talking about a 'monographic-work' or a 'monographic-variant'?

I see three possible pathways to resolve this, which also should play into the conversation above:

option 1 Allow the overlap, both "Picnic At Hanging Rock" (the work) and "Picnic At Hanging Rock [Director's Cut]" (the variant) are monographic, which feels similar to @torbjornbp's mention that "everything is an expression". At this point you would possibly also follow Appendix K of the Manual literally, and merge work and variant into work/variant and infer variation by the dedicated variant type or has variant data.

`Picnic at Hanging Rock` > `rdf:type` > `monographic (subclass of work/variant)` AND `has variant` > `Picnic at Hanging Rock Directors Cut`
`Picnic at Hanging Rock Directors Cut` > `rdf:type` > `monographic (subclass of work/variant)`  AND `has variant type` > `augmented`   

options 2 Keep work and variant vocabularies distinct by adding the class name to each subclass, eg monographicwork or monographicvariant. This looks ugly to me and implies that these are different attributes - I think they are clearly actually the same thing. Worth noting this is what we were doing in the early drafts of FIAFcore.

option 3 ignore the "description type" for variant, and only use the dedicated "variant type", maybe assuming that a variant would inherit the "description type" of the work (can anyone think of instances where this is is not true, eg the work is monographic but the variant is analytic).

`Picnic at Hanging Rock` > `rdf:type` > `monographic (subclass of work)` 
`Picnic at Hanging Rock Directors Cut` > `rdf:type` > `augmented (subclass of variant)` 

Some of this could be informed by what kinds of queries you wish to run, eg "show me all variants" - this is easy for 2 and 3, but not 1 (where you need to infer via has variant or variant type) , however I am kind of leaning towards that option as a possible pathway forward - especially seeing as how

  1. as @stephenmcconnachie has pointed out, variant is rarely currently used.
  2. work and variant are identical regarding properties (aside from "Variant Type").
  3. by removing this hierarchy this also removes the assumed inheritance of values which @torbjornbp is describing.

In the EN 15907, there is term "description level" (which is more accurate than "description type" in my opinion) and it is just property of Work, not Variant.

natashafairbairn commented 1 year ago

Back to the issue on Language as an element of a Work, @paulduchesne, I would have said it shouldn't be and more properly belongs at Manifestation/Item level.

Re. Variants, there is a potential overlap between Work/Variants in terms of fields - which is why in the Manual Chapter 1 is for both Works/Variants rather than in 2 distinct chapters. This is why Language is listed under Elements of a Work/Variant, but if you look at the wording about that element it is clear that its use is envisaged at Variant/Manifestation level and not Work. Personally I wouldn't have it at Work or Variant level - but presume we were following EN15907 here. Fields such as title, synopsis/shotlist, credits would be needed on the Variant because those are where there may be variations from the original Work. But there is also overlap and potential for replication of data e.g. the cast on screen remains the same as the Work in a dubbed Variant, but the latter may have additional different credits for the dubbing artists for each role. Whether the whole cast is also replicated in the Variant, or just the different credits would have to be considered.

The Variant Type covers aspects of variation from the original Work relating to its primary Manifestation such as language, colour, sound, etc., eg. Dubbed, Colourised, Sonorised types, but the actual descriptive data for these would most logically be in the linked Manifestations, i.e. a Variant has a Type of Dubbed (with a different title to the Work, and some extra dubbing credits), but the language fields and data relating to the dubbed language sit in the Manifestation.

This makes most sense with Augmented Type of Variant, e.g. a Director's Cut. It would be linked as a child of the original Work, and it could then have multiple Manifestations reflecting different dubbed or subtitled language releases in various countries. You wouldn't want to create a new Variant record for each of those instances of the Director's Cut as well because then you would have to have a Variant record that was the child of a Variant record which I don't think is allowed in EN15907 (they would be variants of the Variant not the Work technically) and anyway would lead to a potentially horrendous complex structure with odd bits of data sitting in different areas or mass replication of other data. But then if you have multiple dubbed versions of the Variant you may have different sets of dubbing credits to accommodate. Rather theoretical since in most cases you don't have that amount of detail anyway, or may only want to restrict adding actual holdings rather than details about every version. You can go round in circles with Variants.

Having language field at Manifestation level resolves the issue raised earlier:

"Another thing - which I prefer to discuss in Variant discussion topic - is when there is co-production of countries, for instance, France - Italy or Czechoslovakia - Western Germany and there are two separate original language versions usually made in Post-Synchronization during original production (French, Italian or Czech, German). It would be wrong to say that language of the former is French and Italian, and of the latter Czech and German."

There would be separate French, Italian, Czech and German Manifestations with neither one taking precedence but release dates and release country on the Manifestation would reflect whether one was released earlier or simultaneously.

Variants are the one area of EN15907 that everyone has most difficulty with, and it may be because its rules are still too reflective of or tied to the original bibliographic FRBR standard where they work for books but not quite for moving image. This is also the case with some of the Work Description Types which we tried to make clearer within a moving image context within the Manual - though the strict definition of Analytic is actually still tied into a bibliographic context rather than earlier definitions of the same term in historical moving image cataloguing manuals.

paulduchesne commented 1 year ago

Before I forget about it, another question which I wanted to bring up is whether we should be using the D.1 "work description type" in our modelling at this stage at all. Following @stephenmcconnachie's recommendation for ruthless pragmatism, I was thinking that, despite it being presented as a "complete" vocabulary in both the standard and manual, whether it is very useful for us in this context. From what I have seen of the archives who are using it, there seemed to be 98% Monographic Works, 2% Serial and Collection/Analytic were extremely rare (if at all). What was your breakdown in this regard @ladislav-nfa ?

paulduchesne commented 1 year ago

Another thing - which I prefer to discuss in Variant discussion topic

I was going to ask the question of whether following the option 1 expressed above, whether we wanted to actually combine work and variant to a single work/variant entity, where variants is then very much optional and identified by has variant or variant type properties?

I guess the relationship between work and variant (in the standard and cataloging manual) is complicated as it actually involves two types of relationships, depending on if you see the work as work itself (content) or as both the work and the process of realization (production activities, circumstances and "contents").

I think this is a really insightful observation, and makes me wonder if we are describing in different ways that we need a top-most level "work aka content" node to represent "artistic work", with child "work/variant" nodes. If this is the case then it is just a question of naming, either "work" -> mandatory "variant" or "content" -> mandatory "work/variant".

However I'm not sure I understand your distinction between "dubbed" variants and "augmented" variants, as they both involve adding/subtracting material not available in the original instance - is the difference in maintaining the image integrity of the work, or that augmentation is often the work of the original author using original materials, whereas dubbing is often created completely independently by a third party?

This is why Language is listed under Elements of a Work/Variant, but if you look at the wording about that element it is clear that its use is envisaged at Variant/Manifestation level and not Work. Personally I wouldn't have it at Work or Variant level - but presume we were following EN15907 here.

@natashafairbairn Thank you for clarifying this, this makes a lot of sense, getting back to @torbjornbp's questions about inheritance, it seems much safer that a language is declared at manifestation level with the expectation that all items conform to that, rather than possible confusion around whether a variant inherits language from a work if it is not explicitly expressed.

paulduchesne commented 1 year ago

question from email:

What do you mean by Work has Activity? I thought Activity (eg. director) is property of the Agent, not the Work.

This is a really interesting question, and depends on how you use the activity, whether it is a stable property of the agent (for instance Věra Chytilová was a "Director"), or the relationship between agent and work ("Věra Chytilová" -> "was the director of" -> "Sedmikrásky"). I much prefer the latter, as if you have someone who is variously a director and actor, you want to express explicitly what was their contribution for which work.

In RDF you could express the "activity" as a property, which I feel is similar to how most archives would model this, ie "Sedmikrásky" -> "has director" -> "Věra Chytilová". I personally prefer an extended syntax (similar to what I am proposing for identifiers and titles), which is the additional of another node inbetween the relationship which affords the ability to add data specific to the "relationship between work and agent" later on. For example, "Sedmikrásky" -> "has activity" -> "blank node for relationship (type Director)" -> "has agent -> "Věra Chytilová". With this structure you could add ontology "plugins" in the future to add data to the blank node which is specific to Věra Chytilová's contribution on Sedmikrásky, eg how much she was paid, how she was credited - or for an actor, how much screentime, character name, etc.

Or I may have misunderstood your question, were you instead asking not about the modelling itself, but the direction? ie does the "agent" "have the activity" which is "the work" rather than "work" -> "activity" -> "agent" - I can certainly see this making more linguistic sense, and ideally all of these properties should be bi-directional in any case.

stephenmcconnachie commented 1 year ago

For what it’s worth, we use Serial probably more than a film archive without tv responsibility - because we are the national television archive, and we automate data creation for 17 channels of UK tv capture, and have decades of TV on videotape too.

So we have 40,337 Serial Works, and in the last year we have created 900 Serial Works.

In stats terms, we have 1,327,289 Works, so 40,337 = 3% of total.

stephenmcconnachie commented 1 year ago

We model it in both ways, in our agent record and in our work record:

paulduchesne commented 1 year ago

We model it in both ways, in our agent record and in our work record:

Would it be fair to say that "set of activities" could always be inferred by relationships, or do you indeed have instance where you know of someone who was a "director" even though they have no actual director credit against a work?

If this is the case then it is just a question of naming, either "work" -> mandatory "variant" or "content" -> mandatory "work/variant".

In terms of moving forward, I wonder if we could proceed by having work/variant as parallel entities, with the use of has variant expressing the content link between the two, and avoiding invoking an extra tier.

Whether the whole cast is also replicated in the Variant, or just the different credits would have to be considered.

I am always wary of absent data implying something (explicit rather than implicit), especially as I'm sure there are cases where a Variant does not inherit a contribution (eg a voice actor redubbed, an actor edited out), I would lean towards the former.

natashafairbairn commented 1 year ago

Oops, I think I hit wrong button and seem to have lost the reply I was trying to do re. the above. Will add here as it seems to have disappeared:

Actually the Work Description Type is core to both structuring and categorisation so it is important to use it, even if in majority of cases it will be Monographic. We have over 8,000 Analytic work records and c.230 Collection Work records - and again this may be due to the nature of our collections and activities at the BFI, e.g. most of our Analytic records pertain to individual elements from newsreel Topical Budget. Individual records for these (linked as children of a Monographic Work for the whole newsreel episode) are needed because the BFI streams them as individual films on BFIplayer (i.e. each of 5 individual story elements in a newsreel issue are titled and streamed rather than the newsreel issue as a whole) and each has its own individual Internet Manifestation. Collections are used with home movie collections and production materials collections (which aren't Series/Serials in the same sense as intended film or TV serials). My only issue with Work Description type is with its EN15907 definitions - which have been lifted straight from a bibliographic standard unchanged and are therefore difficult to get your head round to apply to a moving image context - and in the manual we tried to tighten up and clarify the definitions and fit these better into a moving image context.

natashafairbairn commented 1 year ago

Tied into the above and a comment made early on in this string regarding HasWork being missing from Work attributes/descriptions in EN15907 - I always thought it was an oversight that it was missing or else it was again tied into original bibliographic contexts and envisaged that Serial, Monographic, Analytic Work records would be in a horizontal associated relationship rather than a child/parent one. There is nothing wrong with the latter particularly, although you would need to ensure your system could display both hierarchical and related records easily and it could get a bit messy from a visual user point of view if you have a mixture of separated Monographic and Analytic Works along with other types of related records as well, e.g. a Work record for a documentary about the making of the TV series, or related Stills, Books, etc. all mixed in with individual Work episodes - ordering would become an issue potentially. Again, this horizontal associated relationship can make sense within a bibliographic context but within a moving image context a child/parent part/part of relationship is much more logical or instinctive and also useful in terms of clarity and access, e.g. in bringing everything clearly together and their precise relationship. Thus a Serial or Collection Work can have child Monographic Works and grandchild Analytic ones. This is where the FIAF Manual has been pragmatic rather than purist.

stephenmcconnachie commented 1 year ago

This is where the FIAF Manual has been pragmatic rather than purist.

Hallelujah! Amen!

torbjornbp commented 1 year ago

I guess the relationship between work and variant (in the standard and cataloging manual) is complicated as it actually involves two types of relationships, depending on if you see the work as work itself (content) or as both the work and the process of realization (production activities, circumstances and "contents").

@ladislav-nfa This is how I read it: the work in the film standard is both. A merged FRBR/LRM work and expression entity. Similarly to what we find in FRBR/LRM the mandatory expression attributes are also mandatory in the film model, it’s just that they are located at the work entity. By defining what is work and expression attributes, you stay compatible with FRBR/LRM as the expression entity theoretically can be inferred from the work entity´s expression attributes. At the same time you don’t have to actually use the expression entity.

I think there is an inherent value in keeping the film model somewhat compatible with FRBR/LRM. Our institution at least, have large non-moving image collections that are supposed to be cataloged according to LRM, and keeping cross-compatibility in metadata structures is very beneficial to us.

But the problem with FRBR is that it works well for literary texts, but not so much for films, as FRBR somewhat blurs the fundamental distinction between the original work and its later derivatives, the aspect of provenance that is crucial not only to film.

This is actually solved in LRM with the introduction of the representative expression attribute. Expressions can be original/derivative or equal.

Back to the issue on Language as an element of a Work, @paulduchesne, I would have said it shouldn't be and more properly belongs at Manifestation/Item level. […]The Variant Type covers aspects of variation from the original Work relating to its primary Manifestation such as language, colour, sound, etc., eg. Dubbed, Colourised, Sonorised types, but the actual descriptive data for these would most logically be in the linked Manifestations, i.e. a Variant has a Type of Dubbed (with a different title to the Work, and some extra dubbing credits), but the language fields and data relating to the dubbed language sit in the Manifestation.

@natashafairbairn I sort of like this suggestion (even though I reckon our catalogers will not). I still inherently feel the language should sit at the work/expression level (EDIT for clarification: I'm not saying it should only be there!). Isn’t the language an important part of the content the work entity in the film model is supposed to represent?

Is this how it is intended in the manual at the present? In that case it would be another quirk of the film standard. The language attribute in RDA/LRM/FRBR is all situated at the expression level.

stephenmcconnachie commented 1 year ago

From our perspective the Work is language-agnostic, with the Manifestation carrying all that load across the lifecycle of the work – and in our Manifestation records we capture language(s) with structured vocabularies for the properties – eg

Tokyo Story Manifestation representing UK cinema release Japanese | Dialogue (original) English | Subtitles

ladislav-nfa commented 1 year ago

Natasha wrote: “There would be separate French, Italian, Czech and German Manifestations with neither one taking precedence but release dates and release country on the Manifestation would reflect whether one was released earlier or simultaneously.”

My standpoint on having language on the Work or Variant level rather than Manifestation level is as follows. There could be many Manifestations with, for example, Czech language (language use: dialogues), so there will be duplication of language information on each Manifestation provided the language will be recorded as part of a Manifestation metadata. If the language information is on the Work level (and variant level for “secondary” country of origin, like that of Western Germany co-production, language: German) we could avoid such a duplication. Moreover, there are films produced in Czechoslovakia having more than one language - Czech language usually being primary and other languages secondary (usually this is the case when one character has different nationality). I have proposed a new element for this in our new metadata specification for the new system (boolean attribute “minor language occurrence”).

There is another issue - Czech dubbed versions of foreign films. These would be Variants, with specific cast (dubbing speakers) which we do record (especially the case for animated films). I would expect language information on the level of Variant as the language is the property of dubbing version (and has close connection to language of dubbing speakers, for instance Czech actors) and this Variant could be represented in more than one manifestations (original release of foreign film, TV release, DVD release etc.). So language on the Variant level would avoid this replication. Moreover, there are often more than one Czech dubbed variant (i.e. with different cast, one for original theatrical release, another one for later DVD release or for TV manifestations - different Czech TVs even often make their own dubbing version due to financial reasons) - something rather common in the Czech film publication context. (I have idea to connect Work and Variant cast - actors and dubbing speaker through character name /each will have unique identifier.).

Anyway, If I understand it correctly, both approaches (language on Work/Variant level vs. Manifestation level) are possible and it depends on local interpretation / implementation of the standard?

At least for our purposes, I would like to keep possibility to record language, at least language of dialogues, on the Work and Variant level.

ladislav-nfa commented 1 year ago

breakdown

We have about 30 000 newsreel segments catalogued individually (analytic descrition level) which make up about several thousands newsreel Works (catalogued on the level of newsreeel issue - we do know have records for newsreel title).

ladislav-nfa commented 1 year ago

Paul wrote:

However I'm not sure I understand your distinction between "dubbed" variants and "augmented" variants, as they both involve adding/subtracting material not available in the original instance - is the difference in maintaining the image integrity of the work, or that augmentation is often the work of the original author using original materials, whereas dubbing is often created completely independently by a third party?

The distinction I meant is as follows. If you replace soundtrack with another one (difference is in dialogues), you can compare the two films by observing. You need to understand the languages, but it is doable.

On the other hand, augmented (extented) variant contains something which is not in original manifestation so you cannot compare by observing. So without research you cannot ascertain if the added scene comes from original production (shooting) or where shot much later (which would, in my interpretation of the standard, establish new Work instead of Variant).

This distinction I have made is just to illustrate the concept of derived variant work from original manifestation and derived variant work from production material unused in the final cut of the former in order to show, in my opinion, crucial difference between the two concepts of work.

ladislav-nfa commented 1 year ago

question from email:

What do you mean by Work has Activity? I thought Activity (eg. director) is property of the Agent, not the Work.

This is a really interesting question, and depends on how you use the activity, whether it is a stable property of the agent (for instance Věra Chytilová was a "Director"), or the relationship between agent and work ("Věra Chytilová" -> "was the director of" -> "Sedmikrásky"). I much prefer the latter, as if you have someone who is variously a director and actor, you want to express explicitly what was their contribution for which work.

In RDF you could express the "activity" as a property, which I feel is similar to how most archives would model this, ie "Sedmikrásky" -> "has director" -> "Věra Chytilová". I personally prefer an extended syntax (similar to what I am proposing for identifiers and titles), which is the additional of another node inbetween the relationship which affords the ability to add data specific to the "relationship between work and agent" later on. For example, "Sedmikrásky" -> "has activity" -> "blank node for relationship (type Director)" -> "has agent -> "Věra Chytilová". With this structure you could add ontology "plugins" in the future to add data to the blank node which is specific to Věra Chytilová's contribution on Sedmikrásky, eg how much she was paid, how she was credited - or for an actor, how much screentime, character name, etc.

Or I may have misunderstood your question, were you instead asking not about the modelling itself, but the direction? ie does the "agent" "have the activity" which is "the work" rather than "work" -> "activity" -> "agent" - I can certainly see this making more linguistic sense, and ideally all of these properties should be bi-directional in any case.

Yes, I see it in the same way as you. HasAgent is relationship Work-Agent. Activity is tied to this relationship. For on the level of Agent entity, you will have her or his main profession, for instance playwriter (and possibly president) for Václav Havel, who happens to be actor in Vorel´s film Kamenný most. Here the activity type = actor is property of hasAgent, not Agent. (Interestingly, he is playing the character "president", not the case of "cast as himself" /he was actual president at that time/, but what is newly called "fictionalized himself".)

ladislav-nfa commented 1 year ago

One more comment to language element on Manifestation level. I do see some advantage of that in case of DVDs a Blurays where you have multiple subtitles or dubbed version you could select for playing, the same goes for VOD platforms. Anyway, even here, the language which is original language of the Work is usually accompanied by something like "original version" text (so this is the case of inheritance of original dialogues language of the Work.)

paulduchesne commented 1 year ago

The distinction I meant is as follows. If you replace soundtrack with another one (difference is in dialogues), you can compare the two films by observing. You need to understand the languages, but it is doable.

I am appreciating the different perspective presented here, and this now makes sense to me: also at an anecdotal level asking someone what they thought of a film, would be quite different whether they had seen a dubbed version (in which the integrity should be retained) as opposed to an augmented or censored version.

paulduchesne commented 1 year ago

I'm going to try to synthesize the comments above into some changes in the turtle/markdown files, and look forward to discussing this more with you tomorrow.

paulduchesne commented 1 year ago

This discussion is now closed, the new ticket around Manifestations can be found at #3.