Closed paulduchesne closed 1 year ago
Welcome to the item
discussion. A key question for me is which is the key attribute which is used to define the "type" of the item
: the format
(eg 35mm film), the element
(eg original camera negative) - or neither and both are expressed as properties of the item
.
Interestingly there is no "Item Type" listed in the manual (at least not in the Appendix K table at then end).
I think previously I had shortened 3.1.4 "Has Element Type" to get away from using "Type" outside of entity classification, but item
"Has Element" is clearly wrong - the item
is the element, eg my film is an 'original camera neg'. This makes me wonder whether item
should have subclasses taken from D.7.8, and a reuse of the "has format" property shared with Manifestation.
Just adding some general comments on the item and carrier divide (or is item part better?)
Our use of the item and carrier level might be very carrier focused, but how we see it, there is very little information belonging at the item level. It is beneficial to have carrier information “summarised” on the item level, but we prefer entering it at the carrier.
For us the item level is merely what the describes a unit of carriers. Typical attributes would be extent in carriers (eg. “this unit should consist of 5 carriers”), an item title (eg. “Film x, Print Y), “function” (equivalent of element
), as well as every bit of information describing the “whole”. For example Norwegian censorship logs does not refer to works, but rather to particular prints, but not particular carriers. The relationship to the log and the metadata found in these logs (eg. a summarised length given in meter), belong at the item level.
For us the carrier is the first entity in the system that actually describes something in the physical reality. The carriers we use say what the material actually is. Base, gauge, film stock, aperture, measured length, conditions/treatments, carrier title. A lot of this information is often true for all carriers in an item, but it doesn’t have to be. Putting it at the item level causes a range of issues, the moment it doesn’t-
For first time cataloging it is often easy to describe the carrier in great detail, without knowing which item, manifestation or work it belongs to. We have 100k-200k film carriers in our collection with fairly detailed descriptions that are currently cataloged as orphaned film carriers. Having to deal with these as items at this stage of cataloging is not beneficial.
Depending on your CMS it might be difficult to work in a system where this information have to be entered multiple times for all carriers, or where it is not visible over the item level. These issues are CMS issues not model issues in my opinion. If a CMS is very rigid and you don’t want to go down this route, you should remember that a CMS is not the model. You can put a carrier attribute from the model at the CMS item entity. Similarly I guess you could go down the other route and put item attributes at the CMS carrier level, but I reckon that could cause some issues in a data exchange.
I don’t agree with most of this approach, unfortunately Torbjørn, and not just for CMS rigidity reasons.
We have many millions of Items and carriers, and the model you’re proposing was ruled out in early implementation stages. We are still developing our carrier record but we aim to store carrier-specific data in Carrier record (reel condition, reel extent, container barcode association, etc), but all data that is shared by all carriers would store in the Item: gauge eg 35mm; description eg Internegative; sound status and properties; colour status and properties; acquisition source, date, method; base, stock, etc etc
I don’t think it makes any sense to describe shared properties of all carriers in each carrier – that’s the Item’s job…
But, of course, we may be on our own path with that, others may completely disagree about that! Good debate!
I am extremely sympathetic to pushing data down the tree, as it both gets closer to being verifiable as related directly to primary sources and allowing tolerances for exceptions (eg mixed-base, or mixed-gauge film items), but I would concede that both those exceptions are extremely rare, and a model designed with data exchange in mind should possibly follow the perceived wisdom of the day.
As mentioned in previous conversations I think one of the issues is with CMSs which do not allow higher levels to reach down and summarize data which is explicitly connected to lower tiers - would be interested if anyone has seen a system which does this?
@torbjornbp, I like your suggestion of the term "function", but I wonder how this goes outside of the film context. One of the interesting things about D.7.8 is has equivalency between "internegative" and "DCP", when you see it from the perspective of a "function", "DCP" should maybe just be a (technology agnostic) "release print"?
I might be a carrier extremist, but I can add that our current CMS is working similarly to what @stephenmcconnachie is describing. However, doing that for the last 25 years is what is pointing us in a new direction. Acquisition source is an excellent example! It sits at the item level in our current database, but is very troublesome due to mixed acquisition sources for carriers within items.
We have slowly come to the realisation that a lot of attributes we currently put at the item level is actually not so uniform across carriers. A symptom is that a lot of vocabulary choices for “mixed/see notes” has appeared in our CMS over the years, making our controlled vocabularies less useful and searches worse.
You can solve some of these issues by allowing for multiple cardinality of these troublesome attributes at the item level, but its not a very good workaround as it lacks precision (without typing out more extensive notes). “Which carrier does an instance of an item attribute refer to?”. We might put some information at both the item and the carrier.
I’m all for pragmatism though, so I reckon the model should allow for such attributes being available at both levels. At the moment not having the carrier level is more in accordance with the standard than having it!
@paulduchesne, I think we would use something akin to D.7.8. Even though “DCP” might be more specific and technically precise than “release print”, this is how we currently would do it.
I have a question regarding this property:
3.1.3 Holding Institution
Manual indicates text, but institution should be an entity. Also generalise range as institution so that these entities can be reused in another context.
https://fiafcore.org/ontology/hasHoldingInstitution a owl:ObjectProperty ; rdfs:label "Has Holding Institution"@en ; dc:source "FIAF Cataloguing Manual 3.1.3"^^xsd:string ; rdfs:domain fiaf:Item ; rdfs:range fiaf:Institution .
Why is the holding institution not considered an Agent (if Agent means both person or corporate body)? And having asked this, wasn't there an Agent class in the ontology before that is no longer included in the draft, or am I mistaken?
Why is the holding institution not considered an Agent (if Agent means both person or corporate body)?
Good question, especially as institutions can also hold other production credits (eg producing archival releases) and you would want these to be linked. I suppose you could maybe consider there being a distinction between institution as an organisational unit and a geographical location, but I would be in favour of what you are implying (BFI -> type -> organisation) and maybe reversing the direction? BFI -> has holding -> some film, or some film -> held at -> BFI? By generalizing to agent also allows for the hypothetical inclusion of films which are held by individuals in private collections.
And having asked this, wasn't there an Agent class in the ontology before that is no longer included in the draft, or am I mistaken?
I am pushing that we would want a node in between "work" and "agent". For example, we could express Hal Hartley directing Simple Men as a direct relationship like this:
flowchart LR
SimpleMen --hasDirector--> HalHartley
But I am a real advocate of adding an extra entity in between these, which can call an "activity" using terminology from the manual.
flowchart LR
SimpleMen --hasContribution--> BlankNode_TypeDirector --hasAgent--> HalHartley
The purpose of this would be to allow the addition of data points which relate specifically to the intersection of the entities: for instance how much Hal Hartley was paid for this film, how he was credited, or if an actor: character name, screentime, credit ranking.
I am still trying to get my head around how to best express item type, format and other related tech attributes. Using this example from the Bundesarchiv XML:
<Exemplar uuid="56f74877-c131-4baa-8f22-4ee9802ddf42">
<Medienart>
FILM
</Medienart>
<Signatur>
B-133150
</Signatur>
<ExemplarStatus>
Unbekannt
</ExemplarStatus>
<Filmbreite>
35 mm
</Filmbreite>
<Traeger>
Triazetatzellulose
</Traeger>
Signatur, ExemplarStatus, Traeger can all be expressed as Identifier, Status and Base respectively, but I am interested in Medienart and Filmbreite.
Different options:
1)
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has type (rdf:type)
-> Item
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has carrier type
-> Film
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has gauge
-> 35mm
2)
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has type (rdf:type)
-> Film (subclass of item)
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has gauge
-> 35mm
3)
56f74877-c131-4baa-8f22-4ee9802ddf42
-> has type (rdf:type)
-> 35mm Film (subclass of film, subclass of item)
I personally lean towards the third option as I feel it would aid querying (you can ask simply "show me all film" or "show me all 35mm film" as opposed to chaining requests: "show me all film" AND "has a gauge of 35mm"). This though does require a full taxonomy of item types, which taking table D.7.2 literally could look something like this:
graph TD;
Item-->Film;
Film-->35mmFilm;
Film-->16mmFilm;
Film-->Super16mmFilm;
Film-->8mmFilm;
Film-->Super8mmFilm;
Film-->9.5mmFilm;
Film-->17.5mmFilm;
Film-->70mmFilm;
Item-->Video;
Video-->1InchVideo;
Video-->Digibeta;
Video-->BetacamSP;
Video-->2InchVideo;
Video-->HDCAMSR;
Video-->D1;
Video-->D5;
Video-->DVCPROHD;;
etc
The other thing to throw in the mix, we agreed to add carrier
- would not the formats above be best expressed at this level given they conform most immediately to the physical nature of the carrier
? And if so, could you have your carrier type
expressed as the item format, and then free up the item type to be taken from element type
(or instantiation
from 15907)?
I also wanted to highlight that the Bundesarchiv data has some interesting attributes at carrier level (or Aufbewahrungseinheit): colour, base and gauge. My pragmatic question would be, as these values can be expressed at either level and should be interchangeable in 99% of cases could we not pick a single level ourselves and transform. To unpack this a bit - Archive A only has gauge information at Item level, Archive B only at carrier - I think we should be picking one of these options and either dragging that data up or down, unless I have this wrong and the declaration at a different level actually is significant?
Just returning to item type
and I can see a bit of a problem with having an overlapping vocabulary shared between item type
and the format
of manifestation
hasFormat
. This is a problem because if the values are shared (eg manifestation
> hasFormat
> video
, and item
> has item type
> video
), it becomes ambiguous whether video
is a subclass of
format
or item
.
Assuming we wish to keep both item
and manifestation
statements of format (which was generally supported in a previous discussion), I see two solutions.
1) create two distinct but overlapping vocabularies for the two classes format
and item
. I think this is messy and difficult to maintain for limited gain. It is also seems incorrect given "video" is a (mostly) discrete concept.
2) allow for hasFormat
as a property of both manifestation
and item
with the same vocabulary of formats (implied as ideal and actual, which should ultimately be a more explicit distinction).
Pathway 2 means that item
again has no subclasses (ie there is no item types
), which could allow for using element type
in that capacity? This is interesting as I feel the element type
says more about the conceptual purpose of the item
, similar to how manifestation type
primarily communicates function.
This does not strike me as untenable as the carrier
tier is then the direct representation of the physical item, although unable to use the format
vocabulary (as type) without striking exactly the same issues expressed above.
~
Applying this to the BA example is interesting because the element type
field ("Materialart") is present at carrier level, so what I am proposing for this mapping would be to not only pull it up to item level, but in fact define the item type
with an additional statement for the format
:
item 56f74877-c131-4baa-8f22-4ee9802ddf42
-> has type (rdf:type)
-> Bildduplikatpositiv
(aka Image Dupe Pos)
item 56f74877-c131-4baa-8f22-4ee9802ddf42
-> hasFormat
-> 35mm Film
(subclass of film
> format
)
As to Item type - I wouldn't mix Item Element Type with Item Type. An Item can consist of more than one Item Elements (something not grasped by EN, where instantiation type has cardinality zero or one). Original picture negative and original sound negative are 2 elements forming one Item. Perhaps there is not need to have Item type at all. (Manifestation Type is publication context type, not technical property.)
In our new system modelling, we propose Item consisting of one or more Subitems and Subitem consisting of one or more Item Elements. Subitem type for analogue film is image, soundtrack or composite, and Item element under Subitem = image is - for instance - original picture negative, and under Subitem = composite it is combined print. So there could be - for example - 1 Item with 1 Subitem = Image having two Item elements (original picture negative and duplicate negative) a 1 Subitem = soundtrack having one Item element (original sound negative). These 3 elements form 1 Item.
So Item Type in our system will be something like "Item model". The example above will be standard not-for-screening sound film model, whereas standard screening sound film model will have 1 subitem = composite with 1 element (composite print).
It may seem to be complicated for other archives but it actually could help us to prescribe allowed combinations of elements. Original picture negative and sound print cannot be combined in a standard model, for instance.
Thank you @ladislav-nfa this is such an interesting perspective. I can't say I have heard of anyone grouping corresponding production components together, but I can advantages (eg copying/digitising history).
My first pass did not have an item type
, nor do we have a carrier type
, which maybe we could just wear for now. In the vocabularies ticket I was coming around to the idea of the item type
being the general carrier type
(eg Film, Digital File, etc) which I think @stephenmcconnachie was alluding to last talk, but I don't know if there is much point if it can be unambiguously inferred from more granular format
info. Or if more granular format data is missing: if we treat format specifics (eg 35mm film
) as subclasses, then we can still retain information that an item is film
even if we have no further info re gauge
, base
, etc.
If I understand correctly, your "subitem" concept is almost like a further tier sitting between item
and carrier
?
Just to revisit the example from further up the page, this would result in:
item 56f74877-c131-4baa-8f22-4ee9802ddf42
-> has type (rdf:type)
-> fiaf:Item (no subclasses)
item 56f74877-c131-4baa-8f22-4ee9802ddf42
-> has format
-> 35mm Film (subclass of film, subclass of format)
Need to implement the above proposal and then this issue can be closed.
Discussion around the modelling of
item
, using the FIAF Cataloguing Manual as primarily source.Item Elements
Define the
item
class.3.1.1 Identifier 3.1.1.1 Identifier Type
At this level identifier is less likely correspond to an external resource, rather internal archival ids.
3.1.2 Title 3.1.2.1 Title Type
3.3.1 Agent(s) 3.3.1.1 Agent Activity
3.1.7 Notes
As with
work/variant
andmanifestation
all elements terminating in text blocks have been removed.3.3 Relationships
Relationships are explicitly expressed elsewhere.
3.3.2 Events
3.3.3 Other Relationships
Horizontal
item
relationships not currently supported.3.1.3 Holding Institution
Manual indicates text, but
institution
should be an entity. Also generalise range asinstitution
so that these entities can be reused in another context.3.1.4 Element Type
Possibly remove
element type
in favour of making theitem
subclasses?3.1.5 Item Physical-Digital Description 3.1.5.1 Carrier Type 3.1.5.1.1 Carrier Type: General 3.1.5.1.2 Carrier Type: Specific
General and specific carrier types should be converted into a taxonomy of
formats
which are used at both this andmanifestation
level.3.1.5.3 Sound 3.1.5.5 Sound System 3.1.5.4 Sound Channel Configuration
3.1.5.6 Colour
3.1.5.7 Unit Number
Expressed as
extent
.3.1.5.8 Extent
This should encompass both unit counts (with type Reels, Rolls, etc) and durations (with type Minutes, Hours).
3.1.5.9 Projection Characteristics
Currently renamed as
image characteristic
, to allow for recording characteristics unrelated to projection.3.1.5.10 Broadcast Standard
Vocabulary can be found under 3.1.5.10.
3.1.5.11 Duration 3.1.5.11.1 Duration Precision
Expressed as
extent
.3.1.5.12 Frame Rate
Manual recommends drawing from controlled vocabulary rather than allow for integer/float data.
3.1.5.13 Base
Vocabulary can be found under D.7.7.
3.1.5.14 Stock 3.1.5.15 Stock batch
A vocabulary of stocks can be found under D.7.16, noted as being extendable. Stock batch/code should be a datatype property terminating in strings.
3.1.5.16 Video Codec
A vocabulary can be found D.7.10. Both this and Audio Codec should be subclasses of
codec
. Also worth considering: a single file can have multiple streams of different codecs, so a better model would possibly beitem
->hasStream
->stream
(type AudioStream) ->hasCodec
->WAV
.3.1.5.17 Audio Codec
See above.
3.1.5.18 Resolution
A vocabulary can be found at D.7.19. As some of these terms ("2k") can be contentious, a possibly replacement could be literal pixel dimensions (eg 1920 by 1080).
3.1.5.19 Line Standard
Possible overlap with
resolution
for digital instances.3.1.5.20 Bit Depth
As with codec, there can be multiple bit depths under a single file (eg even just separate audio and video track). Following the proposal of introducing a
stream
entity, which could also have ahasBitDepth
property.3.1.5.2 Item Status
Controlled vocabulary under D.7.3. I feel "status" is possibly an ambiguous term.
3.1.6.1 Item Condition
Would this property be better placed at
carrier
level as it pertains explicitly to the physical object?3.1.6.2 Item Location
Would this property be better placed at
carrier
level as it pertains explicitly to the physical object?3.1.5.21 Source Device
Small vocabulary at D.7.20, I would question whether this should be a device to playback the material (as indicated in the vocabulary) or the device used to create the material (ie the source of the item)
3.1.5.22 Source Software
As above, I would claim that it is more interesting that a file was created with FFmpeg than it can be played back with VLC.
3.1.5.23 Transfer Speed
Would question how often this data is retained by archives, if implemented it could use the same
FrameRate
vocabulary ashasFrameRate
.Other Properties
Support for an additional tier for
carrier
to represent physical item-part.