PresConsUIUC / PSAP

Home of the Preservation Self-Assessment Program application.
https://psap.library.illinois.edu/
Other
4 stars 1 forks source link

Improve DC XML "format" values #225

Closed areisemann closed 9 years ago

areisemann commented 9 years ago

Using the same "normal" account login/sample document set-up, I decided to export the record as DC-XML. This time the information for "extents" is under the dc:format element. While this isn't itself a bad cross-walking between EAD elements ("extents" is an ead-unique element that's supposed to give a basic idea of the physical construction of the described object--just like the DC element format is supposed to do), there is a misalignment of information.

For a DC record, the format for an item that is known to be paper-based (such as how the assessment report/record on the PSAP website identifies this item) should have the entry dc:formatUnidentified paper/dc:format in addition to any other entries that may describe the square footage and/or other physical descriptors of the item (which for this record currently are dc:formatSample extent/dc:format and dc:formatAnother sample extent/dc:format). Since the website-viewable item record also specifies that the item has been made with color pigment-based ink, that should be indicated in the record as well under another dc:format entry.

Additionally, the value "item", which is given as the dc:type, is used incorrectly. The values for dc:type should be things like "physical object", "image", or "sound." In this record's case, it seems like "text" or "image" would be the most appropriate.

For quick reference: 1) https://maryewatson.wordpress.com/2014/03/07/dublin-core-format-element/ type vs. format
2) http://www.loc.gov/ead/tglib/elements/extent.html 3) http://dublincore.org/documents/usageguide/elements.shtml

adolski commented 9 years ago

Thanks for explaining this. Can we then make a conditional statement like:

If the format is paper-based --> then it should have a dc:format element with a value of "Unidentified paper" If the format is has an ink/media type --> then it should have a dc:format element containing the name of that type

And would the above apply to support types as well?

(I'm going to split off your last paragraph into its own issue: #227)

areisemann commented 9 years ago

Well, I'd say that if it's paper-based, then we either need to specify for dc:type what kind of paper (i.e. the whole range of paper choices we allow for in the assessment--"newsprint", "acid-free paper", "unspecified paper", etc) or we just have all items with paper-based formats be described as "paper" under the dc:type element.

the dc:format element should indicate to someone (e.g. a researcher who's using the record as a means to see if they should try to get access to that item) the most basic aspect of the item--is it a image? is it a sound file?

to put it another way: an example of why DC is structured the way it is. in the case of items that use ink, it would be highly relevant for a researcher to know if that ink was primarily used to make text letters and numbers on the paper or if that ink was primarily used to make an image on the paper. (think about a literature researcher wanting to text-mine or a researcher looking for statistical data vs an art historian that is analyzing an illustration as visual data) thus, a dc:format entry of "ink" would be useless for any researcher b/c it'd basically tell them nothing.

adolski commented 9 years ago

Can we map the PSAP format directly to a DC format? Likewise for ink/media type and support type?

So for the "Sample Assessed Original Document Resource", it would look like:

dc:format Original Document dc:format Color Pigment-Based Ink (colored printing ink) dc:format Unidentified Paper dc:format Sample extent dc:format Another sample extent

?

adolski commented 9 years ago

Update based on my new understanding from #227:

dc:type Original Document dc:format Text (value from new DC Format spreadsheet column corresponding to Original Document) dc:format Color Pigment-Based Ink (colored printing ink) dc:format Unidentified Paper dc:format Sample extent dc:format Another sample extent

areisemann commented 9 years ago

yep, dc:type is going to have just one entry. however, it's gonna have a value that is one of four possible answers: text or image or sound or video, which corresponds to PSAP's organization into sections based on Audiovisual materials (sound, video), Photo/Image materials (image), and Paper/Book (text).

you are right that dc:format that will have a number of potential values/entries, such as what you've detailed.

adolski commented 9 years ago

Modified to the following:

dc:type "A/V" or "Photo/Image" or "Paper-Unbound" or "Paper-Bound/Book" (one of the existing format categories) dc:format Format name dc:format Ink/media type (if available) dc:format Support type (if available) dc:format Extent (one per dc:format)