carevealed / cavppers

6 stars 3 forks source link

pbcorethat - technical - pbcorepart structure #18

Open klipska opened 7 years ago

klipska commented 7 years ago

Group all files for an object (like con_000080_t01) with PBCorePart. Within PBCorePart include PBCore Instantiation for each.

<pbcoreInstantiation><!--Original Asset--> con_000080_t01 info <pbcoreInstantiation><!--Preservation Master--> con_000080_t01_prsv.wav info <pbcoreInstantiation><!--Access Copy--> con_000080_t01_access.mp3 info

dericed commented 7 years ago

this seems a little backwards to me, should it be like?

<part>
  <instantiation>con_000080_t01.wav<instantiation>
  <instantiation>con_000080_t02.wav<instantiation>
  <instantiation>con_000080_t03.wav<instantiation>
</part>
<part>
  <instantiation>con_000080_t01.mp3<instantiation>
  <instantiation>con_000080_t02.mp3<instantiation>
  <instantiation>con_000080_t03.mp3<instantiation>
</part>
klipska commented 7 years ago

No, the instantiations are versions of the Part. So there is part of a title called tape 01 with a physical instantiation, a preservation file version, and an access file version, and then there is a second part, tape 02, with all those versions also.

There is added complexity for audio cassettes that have multiple files (sides a and b) for each instantiation. I tried to attach a sample pbcore records for audio that we just got from the vendor so you can see what that is like, but it won't let me attached an .xml file here. I'll email it to you.

klipska commented 6 years ago

This is still an issue in the update.

Use the tag to wrap around all instantiations for a tape or reel like

<pbcorePart>
 <pbcoreIdentifier annotation="Object Identifier">sample001_t01</pbcoreIdentifier>
      <pbcoreInstantiation><!--Original Asset--> Tape 01 info</pbcoreInstantiation>
      <pbcoreInstantiation><!--Preservation Master-->Tape 01 prsv info</pbcoreInstantiation>
      <pbcoreInstantiation><!--Access Copy-->Tape 01 access info</pbcoreInstantiation>
</pbcorePart>
<pbcorePart>
 <pbcoreIdentifier annotation="Object Identifier">sample001_t02</pbcoreIdentifier>
      <pbcoreInstantiation><!--Original Asset--> Tape 02 info</pbcoreInstantiation>
      <pbcoreInstantiation><!--Preservation Master--> Tape 02 prsv info</pbcoreInstantiation>
      <pbcoreInstantiation><!--Access Copy-->Tape 02 access info</pbcoreInstantiation>
</pbcorePart>
klipska commented 6 years ago

Audio is extra tricky with the side a and b of a tape using the tag. I converted these vendor samples to .txt so that I could attach them here as examples.

sample_video_PBCore_revised.txt sample_audio_PBCore_revised.txt

dericed commented 6 years ago

Some questions:

klipska commented 6 years ago

No there won't be a side c. If a tape or side has to be broken into multiple files due to speed changes or something there could be t01_p01, t01_p02, or t01_a_p01, etc. which would be treated as instatiationParts. This is all pretty unlikely because the files that come to us don't usually have a strong relationship to the physical object so it is unlikely we would know enough about the original object to name them as t01_p01 or whatever. If these possibilities create too much complication for the script, I think we could update our procedures to make sure we aren't naming files with this level of complication, and just use parts. If there are only parts because we don'y have a sense of the original, or it is born digital we would treat each part (objectID_p01, objectID_p02) we would treat each part as a pbcorePart.

Yes, but as above there is the potential of having t01_a_p01, t01_a_p02, though unlikely.

Hmm, it looks like this is something Media Preserve is doing to indicate that the following info is for the Preservation Instantiation and is distinct from the physical object. This ID "sample001_t01_prsv" doesn't come from anything is our system. Since we already added the comment that labels each section we could reduce this ID to just "sample001_t01".

Ah I think that is the date the files were created. I would select the Created Date of the the last of the instantiationParts to go here since that is when the instantiation was completed. Can you pull created date info from the files?

Please use pbcorePart to wrap the instantiations even if there is only one part.

klipska commented 6 years ago

I think these are the remaining part issues:

Can we use pbcorePart to wrap the instantiations even if there is only one part?

Can we order the parts so they appear sequentially (t01, t02, t03?). Currently it is random, like t02, t03, t01.

klipska commented 6 years ago

This is working for very simple things with one preservation file (corcl_000016) but I'm not getting the part tag on simple objects with preservation files for sides a and b but still just one part (like con_000068 and con_000069).

klipska commented 5 years ago

From 10/1/2018 email - still outstanding.

I'm still not getting the part tag on simple objects with preservation files for sides a and b but still just one part (like con_000068 and con_000069 - samples attached). pbcorePart is not in the sample you attached either. Pbcorepart is showing up in complex objects and in simple objects with just one preservation file (no sides a and b).

klipska commented 5 years ago

Replying to your question in #57: I would like the Original Asset info inside the pbcorePart please.

dericed commented 5 years ago

status?

klipska commented 5 years ago

The Original Asset info is being included both outside and inside the Part tag. We don't want the original asset info outside the part. I'll email an example.

Apparently the Original Asset info does not repeat in simple objects.

klipska commented 5 years ago

ALSO the Part and Original Asset identifier info is using specific file names. I sent the PBCore for corcl_000016. See lines 53 and 59 where it is the access file name instead of just the object ID.

<pbcorePart>
  <pbcoreIdentifier source="California Revealed" annotation="Object Identifier">corcl_000016_access.HD.mov</pbcoreIdentifier>
  <!--Original Asset-->
  <pbcoreInstantiation>
    <instantiationIdentifier source="California Revealed" annotation="Object Identifier">corcl_000016_access.HD.mov</instantiationIdentifier>