Open mlhale7 opened 6 years ago
I think the Data Providers from UTC are clean, however, the Chattanooga History Center (CHC) no longer exists! I'm not sure if it's worth cleaning up, but the appropriate Data Provider should be "University of Tennessee at Chattanooga; Chattanooga Public Library" as the CHC material is now jointly owned by UTC and CPL.
Yes, I almost gave UTC a shout out in the initial message! Nice work. I wasn't certain if some of the less prevalent institutions in the list might be associated with UTC though. I don't think it's critical to clean this provider name up, but it couldn't hurt. Feel free to let us know if you'd like us to re-ingest if you make this edit.
I thought I'd share the values that get mapped to dataProvider from UTK's sets here. As you can see, here are where some of our problems originate. Perhaps we should start our normalization with these. For instance, "University of Tennessee, Knoxville Libraries" is a problem in just one collection: bcpl. We can clean that up quickly and reharvest.
Also, I know @kmiddlet and I have talked about this for MTSU, since they host a lot of objects originating from other places.
I think the best way to work on this is:
Thanks!
cheers! cricket!
From: Mark Baggett [mailto:notifications@github.com] Sent: Sunday, February 4, 2018 9:50 AM To: DigitalLibraryofTennessee/DLTN_XSLT DLTN_XSLT@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [DigitalLibraryofTennessee/DLTN_XSLT] Streamline Provider Names (#54)
I thought I'd share the values that get mapped to dataProvider from UTK's sets here. As you can see, here are where some of our problems originate. Perhaps we should start our normalization with these.
Also, I know @kmiddlethttps://github.com/kmiddlet and I have talked about this for MTSU, since they host a lot of objects originating from other places.
I think the best way to work on this is:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/DigitalLibraryofTennessee/DLTN_XSLT/issues/54#issuecomment-362912051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALCX8huwhUtEUu8vh7LTwiUjFh_RVaDTks5tRcOpgaJpZM4RCaiF.
In case it might be helpful for starting work on standardizing these names, I went ahead and created a spreadsheet on the DLTN drive that lists the controlled term and URI for each provider we currently have in DPLA - https://docs.google.com/spreadsheets/d/1H3Unq7AQmCTiQ6iy_na_du9Oe-Yx7r5c08LqB-NjmTE/edit?usp=sharing I used http://refine.codefork.com/ to do most of this work. Please do double check the names - it wasn't always possible to determine the correct established name. I think focusing on provider names with a lot of associated records that currently have several different forms would be the best way to spend our time (UTK has a few...). Still, people could also consider updating names that currently don't have conflicts to the standardized form to help keep our provider names consistent across institutions.
If we need additional columns to list associated sets or anything else, people should feel free to go ahead and add that information. Use the spreadsheet in whatever way is useful to you.
If anyone is NACO trained and sees a name associated with their institution/collections that would benefit from establishing a name authority file, do consider doing that work. UTK could help out with this if no one is NACO trained at your institution and the number of names is kept low. You can submit a request by emailing me at mhale16@utk.edu or replying here.
@mlhale7 This is great. Can we just use this sheet and create additional sheets in this to show sets and values,institution by institution?
@markpbaggett - definitely. It would be great to keep everything in one place.
Yes, having it in one place would be great. I think we'll be able to fix ours before the harvest, once I know in what collections the non-preferred forms are.
Hi all - Mark and I looked deeper into this and we've discovered that most of the issues are coming from UTK and MTSU. Looking at some of our older XSLT transforms, we are often serializing the provider directly ourselves by hard coding this into the data instead of taking it from the provided XML. The long and short of this is that we at UTK need to revisit our XSLT so we include the provider names you've given us and correct a lot of our own data. I'll update you when we have accomplished some improvements.
Why are You Reporting an Issue
Discuss Issue or Ask Question
DPLA recently shared a spreadsheet with us noting variations among our provider names. I've attached it to this message. Where appropriate, we need to work on standardizing the names we use for the provider. If a name authority exists for your institution, an easy way of decreasing name variations is to use that (though UTK has inconsistencies as well). If all of us could make an effort to streamline our name usage and edit old provider names before the March ingest, that would be really helpful. Cleaning this up will make your records more findable on the Contributing Institution facet within DPLA, so it's worth our time. Please report back with the sets that are updated so we can ingest them again.
DLTNProviderList_NameVariations.xlsx