OregonDigital / oregondigital

OregonDigital Hydra Application
https://oregondigital.org/catalog/
Other
25 stars 5 forks source link

Ingest Gifford Collection #551

Closed tpendragon closed 9 years ago

tpendragon commented 10 years ago
tpendragon commented 10 years ago

@sseymore @kestlund @wickr Going to need help on the metadata questions above.

sseymore commented 10 years ago

@terrellt We have a method for photographer for both local vocab and LC. See our mappings_uo.yml.

I'll look into the other ones.

sseymore commented 10 years ago

@terrellt

There is dct:isReferencedBy for refere if that works

For origin, we could create a formerID predicate or use:

vra:idFormerAccession rdf:type rdf:Property ; rdfs:label "former accession ID"@en ; rdfs:domain vra:Record ; rdfs:range rdfs:Literal ;

For compou, this looks like the value is in the wrong field..? Can this field be renamed to album? We used dct:isPartof for album in the Doris Ullman photo collection. Or, it can be merged with relate because those are similar isPartOf values except for the "glass negatives" one. Can that be cleaned up and merged?

tpendragon commented 10 years ago

We have a method for photographer for both local vocab and LC. See our mappings_uo.yml.

I don't see the actual ruby code for these, just reference to the method.

tpendragon commented 10 years ago

There is dct:isReferencedBy for refere if that works

@wickr @mlv611 Is this good?

Could you both look into @sseymore's comments about origin too?

tpendragon commented 10 years ago

For compou, this looks like the value is in the wrong field..? Can this field be renamed to album? We used dct:isPartof for album in the Doris Ullman photo collection. Or, it can be merged with relate because those are similar isPartOf values except for the "glass negatives" one. Can that be cleaned up and merged?

Removed the bad glass negatives thing in desc.all, I'll use dct:isPartOf for both fields.

sseymore commented 10 years ago

@terrellt Linda is sending you the ruby code.

wickr commented 10 years ago

@terrellt Looks like <origin> is Original Photographic Number. Sometimes it's part of the full ID, sometimes it's not. So another identifier. I'm not sure it's entirely 'former' so I don't think 'formerID' would be good. I think it's more that a photo came in already having that identifier.

wickr commented 10 years ago

@terrellt @sseymore's notes about <origin> work for me

<refere> I think is more accurately something like 'Appears In' as in a printed title, but dct:isReferencedBy is close enough for me.

<compou> was set to hidden from public view, as was <relate>

tpendragon commented 10 years ago

@wickr Should I just not ingest the metadata in compou and relate?

tpendragon commented 10 years ago

@sseymore We have some "work types" that are more specific than what's in Getty - like "5x7 glass negatives" instead of "glass negatives". What do you guys usually do with those?

wickr commented 10 years ago

@terrellt I'm hesitant to say don't include something, but the compou and relate values can go with dct:isPartOf

sseymore commented 10 years ago

@terrellt what field is that? We have cleaned up the data for things like that to make it conform to a CV.

tpendragon commented 10 years ago

@sseymore , , and

tpendragon commented 10 years ago

Right now

<title>Portrait of three women -- Myrtle Gifford, sister, and mother?</title>
<digita>Gifford Photographic Collection</digita>
<creato>Gifford, Benjamin A.;</creato>
<date>circa 1885-1919</date>
<covera></covera>
<descri>Unidentified images that are likely of Gifford family members.</descri>
<subjec>Portrait photographs;</subjec>
<publis></publis>
<contri></contri>
<relati>Gifford Photographic Collection</relati>
<refere></refere>
<identi>P218 SG1 30 12</identi>
<origin></origin>
<type>Image</type>
<format>5x7 glass negatives;</format>
<source>Glass negatives;</source>
<other></other>
<rights>Permission to use must be obtained from OSU Special Collections and Archives Research Center.</rights>
<transm>Master scanned with Epson 10000XL scanner with Silver Fast 8.0.1 r18 (Dec. 5 2012) e75cb1f05.12 scanning software at 750ppi. No image manipulated.</transm>
<file>P_218_SG_1_30_12.tif</file>
<status>Cataloged</status>
<compou></compou>
<relate></relate>
<fullrs>Gifford1/P_218_SG_1_30_12.tif</fullrs>
<find>15.jp2</find>
<dmaccess></dmaccess>
<dmimage></dmimage>
<dmad1></dmad1>
<dmad2></dmad2>
<dmoclcno></dmoclcno>
<dmcreated>2013-03-12</dmcreated>
<dmmodified>2013-03-20</dmmodified>
<dmrecord>6</dmrecord>

becomes

<http://example.org/ns/6> <http://purl.org/dc/terms/title> "Portrait of three women -- Myrtle Gifford, sister, and mother?" .
<http://example.org/ns/6> <http://id.loc.gov/vocabulary/relators/pht> <http://id.loc.gov/authorities/names/n92004880> .
<http://example.org/ns/6> <http://purl.org/dc/terms/date> "circa 1885-1919" .
<http://example.org/ns/6> <http://purl.org/dc/terms/description> "Unidentified images that are likely of Gifford family members." .
<http://example.org/ns/6> <http://purl.org/dc/elements/1.1/subject> "Portrait photographs" .
<http://example.org/ns/6> <http://purl.org/dc/terms/identifier> "P218 SG1 30 12" .
<http://example.org/ns/6> <http://purl.org/dc/terms/type> <http://purl.org/dc/dcmitype/Image> .
<http://example.org/ns/6> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> "5x7 glass negatives" .
<http://example.org/ns/6> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> "Glass negatives" .
<http://example.org/ns/6> <http://purl.org/dc/terms/rights> <http://www.europeana.eu/rights/rr-r/> .
<http://example.org/ns/6> <http://opaquenamespace.org/rights/rightsHolder> "OSU Special Collections & Archives Research Center" .
<http://example.org/ns/6> <http://opaquenamespace.org/ns/conversionSpecifications> "Master scanned with Epson 10000XL scanner with Silver Fast 8.0.1 r18 (Dec. 5 2012) e75cb1f05.12 scanning software at 750ppi. No image manipulated." .
<http://example.org/ns/6> <http://www.loc.gov/standards/mods/modsrdf/v1/note> "Cataloged" .
<http://example.org/ns/6> <http://opaquenamespace.org/ns/full> "Gifford1/P_218_SG_1_30_12.tif" .
<http://example.org/ns/6> <http://www.loc.gov/premis/rdf/v1#hasOriginalName> "15.jp2" .
<http://example.org/ns/6> <http://purl.org/dc/terms/created> "2013-03-12" .
<http://example.org/ns/6> <http://purl.org/dc/terms/modified> "2013-03-20" .
<http://example.org/ns/6> <http://purl.org/dc/terms/replaces> <http://oregondigital.org/u?/gifford,6> .
<http://example.org/ns/6> <http://opaquenamespace.org/ns/set> <http://oregondigital.org/resource/oregondigital:gifford> .
sseymore commented 10 years ago

@terrellt

Apologies of the delay. We have used hasFormat in our collections for fields that list the source format like black and white negative for instance. So I think that's a good fit for Dimension Format (format).

For the source and other fields, which are filled with other CVs, could they be made into subject fields?? It's a lot of random terms and I'm not sure what you guys can do with them.

We also have used hasVersion for other digital file formats, but I'm not sure if this works for you since the values are mixed.

tpendragon commented 10 years ago

@sseymore source/other seem pretty clearly to be something that should come from Getty, and thus RDF.type, no? I'll use hasFormat for

sseymore commented 10 years ago

@terrellt can you send me the desc.all file please?

sseymore commented 10 years ago

@terrellt found the desc.all file. I see the values now. Most of them will be in AAT, so Julia suggests we use vra:workType.

tpendragon commented 10 years ago

@sseymore Every other collection we've imported has used DC.type - was that wrong?

sseymore commented 10 years ago

@terrellt Hmm, well for dct: type, it should be the dcmi type vocab http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/

In DC- "To describe the file format, physical medium, or dimensions of the resource, use the Format element." These 3 fields are describing the source format, so I would go with format elements.

@kestlund @jsimic please correct me if I'm wrong.

tpendragon commented 10 years ago

@sseymore http://dublincore.org/documents/dcmi-terms/#terms-type Says to use -A- controlled vocab, and the range seems to be "any rdf thing"

kestlund commented 10 years ago

@terrellt , @sseymore , @jsimic : we had specified that dcmi-type be the vocab used with dc.type and to use a more appropriate type category if available like the vra.worktype.

See metadata dictionary for OD: https://docs.google.com/document/d/1pudn5bDMikQ0xlNv6cEnscDkakCyeRjPwVkFNNVp0JM/edit#heading=h.4xjto5aqul0

We typically split things like 5X7 glass plate negative and put the 5X7 in a dimension field or description.

I would like to keep dc.type with dcmi type if possible, but let me know if not.

tpendragon commented 10 years ago

@kestlund Data dictionary says RDF.type for work type, which is what we've been using. DC.type for DCMI type, so I'll follow that.

kestlund commented 10 years ago

Great. Yes, do that. Sorry for the confusion.

On Mon, Sep 8, 2014 at 8:39 AM, Trey Terrell notifications@github.com wrote:

@kestlund https://github.com/kestlund Data dictionary says RDF.type for work type, which is what we've been using. DC.type for DCMI type, so I'll follow that.

— Reply to this email directly or view it on GitHub https://github.com/OregonDigital/oregondigital/issues/551#issuecomment-54839450 .

tpendragon commented 10 years ago

Just need to get this reviewed.

tpendragon commented 10 years ago

Sent to Larry to get this spot checked.

tvc15brian commented 9 years ago

Is 'printouts' supposed be showing up as a Work Type? Just curious.

jsimic commented 9 years ago

It can be. See http://www.getty.edu/vow/AATFullDisplay?find=printouts&logic=AND&note=&english=N&prev_page=1&subjectid=300028467

tvc15brian commented 9 years ago

thanks @jsimic

mlv611 commented 9 years ago

@jsimic @terrellt I think the second work type (printouts) should be removed.

jsimic commented 9 years ago

@mlv611 @tvc15brian : Completely up to you guys.

tvc15brian commented 9 years ago

@wickr @terrellt @mickeroo we're about to start a metadata cleanup process with this collection. We have about 35 items waiting for the local workType term of "Glass positives", which I added to worktype.jsonld a couple months ago. How long does it take for new Opaque Namespace Vocabulary Terms to make it into Oregon Digital?

tpendragon commented 9 years ago

Until I run the code to re-fetch things from github. I'll start it.

tpendragon commented 9 years ago

@tvc15brian It's in there and autocompleting now.

tvc15brian commented 9 years ago

Thanks! @terrellt

wickr commented 9 years ago

Update from SCARC meeting 6/18/15:

Before Reviewing this collection:

That will be enough to Review, and then we can continue Topic and Work Type cleanup.

wickr commented 9 years ago

I will likely fix the work type URIs sooner rather than later, because you can't edit an item until they are corrected (Term not in Controlled Vocabularies error).

wickr commented 9 years ago

WorkTypes are mostly cleaned up:

Unsure about:

wickr commented 9 years ago

Photographers URIs cleanup should be finished. Any that aren't listed or faceting at this point I don't believe were in CONTENTdm to begin with, but let me know if I missed anything. Summary:

wickr commented 9 years ago

I also started on the Location cleanup, mapping text to Geonames URIs. There's 274 fields that are not URIs.

Clarkeri commented 9 years ago

@wickr I tried fixing some of the fields today before I left. I will finish fixing the rest on Monday.

wickr commented 9 years ago

@Clarkeri which fields do you mean? Location? I was going to bulk change most of those.

Clarkeri commented 9 years ago

@wickr I changed a few of the location fields. I'm glad you can just do a bulk change. Thanks!

wickr commented 9 years ago

Thanks Erin, apparently you changed almost all of them. I fixed the few remaining ones. Then I ran a script to fix older Geonames URIs that didn't have a slash, and then I ran another script to basically reindex the Geonames labels, so all of the Location/Region facets should be correct and clean.

I also fixed some of the missing images today but there's still lots more.

wickr commented 9 years ago

I fixed about 50 missing images, and I'm pretty sure I got them all.

I want to do some quick Subject cleanup but after that this should be good enough to Review.

wickr commented 9 years ago

Bulk changes for Subjects are done. There were 1500+ text strings, with lots of repeats. Most were in TGM, a handful were in LCSH/LCNAF. There's about 100 left.

wickr commented 9 years ago

Down to 10 unique text strings for Subject:

Extension (2x) Face Rock Gifford, house of (20x) Kueny, Mary (5x) Nygren, Gene Poling Hall Reiling, Norman (4x) Runnion, Kenneth Steiner, John Weatherford Hall

wickr commented 9 years ago

Everything is reviewed and live. The remaining subject cleanup was moved to new content repo: osulp/oregondigital-content#1