UCSCLibrary / ucsc-library-digital-collections

A rails app based on Hyrax to be used as a repository for UCSC library digital collections.
1 stars 2 forks source link

(3) Add subject terms to collection-level metadata records #378

Closed rmjaffe closed 1 year ago

rmjaffe commented 3 years ago

Descriptive Summary

In order to enable collections to be matched when searching, the collection level records need to contain subject terms. Collections will need to be subject analyzed and/or subject terms assigned to the collection's MARC finding aids will be copied into the DAMS records.

Background

This approach is much simpler and more straight forward than pulling subject terms assigned to works with in the collection to be pulled up to the collection level.

Acceptance Criteria

This is what done looks like:

Related Work

Enhance collection metadata editing form UCSCLibrary/dams_project_mgmt#435 need to be completed before this can be done.

rmjaffe commented 3 years ago

@tmariamora @katedundon As I go about assigning collection-level subject terms, I plan to refer to the finding aids on OAC. Are LCSH subject terms generally found in the collection details, or might they appear in other sections of the finding aids?

katedundon commented 3 years ago

@rmjaffe you are correct- subject terms in OAC finding aids are in Collection Details under Indexing Terms (or sometimes the heading Subjects and Indexing Terms). They can also be found in the collection level catalog record.

rmjaffe commented 3 years ago

@NedHenry I just went into the Harry Mayo collection to add subject terms to the collection level metadata, but it doesn't look like there's the option. When clicking Edit Collection, the Title, Abstract/Summary and Thumbnail properties are exposed. And when I clicked on Additional Fields, as few more options revealed themselves, including a Keyword property, but none of our controlled subject properties: subjectName, subjectPlace, subjectTopic, subjectTitle.

Can these properties be added to the collection editing form? Would that be an easy change? If so, I can make a ticket for the work. If it isn't, I'm thinking we and the rest of the team should talk about what it would mean to use the keyword property instead.

rmjaffe commented 1 year ago

@snehagunduraoUL @rschwab Question for you: I just opened the collection metadata editing form for the Steven Rees collection and noticed that the form looks different than the editing for works (thanks, N8). Specifically, it does not look like one can search and select to add controlled values for subject and other metadata properties. If I'm wanting to add controlled terms (e.g. names from LC or locations from Geonames), can I enter the exact string in the collection editing form, do I add the URI, or if I were to add a string in this form would it add it to the local vocabulary instead of recognizing it as belonging to a controlled vocabulary? In a cases where I would want to add controlled values, would it better to round trip than to add them manually?

Editing form for Steve Rees collection metadata: Image

Editing form for random work in the DAMS: Image

rschwab commented 1 year ago

Ok I was finally able to test this out. Here's what I found:

Creator field creator is not on collection edit form creator exports as URI (this is how it should work) creator displays on collection dashboard page (example)

Using subject person as controlled vocab example subject person exports as whatever is entered into the collection edit form ex: https://id.loc.gov/authorities/names/n79056359 | #<ActiveTriples::Resource:0x0000000009c8e8b8> | Kofksy, Frank subject person is not visible anywhere in the front-end AFAIK Round tripping subject person did not alter the values at all ie they remained as: https://id.loc.gov/authorities/names/n79056359 | #<ActiveTriples::Resource:0x0000000009c8e8b8> | Kofksy, Frank

I think what all this shows is that controlled vocabulary fields are broken on the collection edit form. To figure out the extent of what this means I think we need to create a place on the front end to display these values - this can be a code change that only lives on sandbox and is then reverted, as we want to test this but not actually alter our front end.

rmjaffe commented 1 year ago

@rschwab Disappointing but not surprising that this is the case. Insofar as how the controlled terms are indexed, what is being indexed? The URIs or the labels? Let's say it indexes URIs, if I added URIs to subjectPlace, subjectName, etc. using the collection metadata editing interface, then those would be indexed along with the URIs that had been entered on the work level. Or vice versa, if it's indexing the strings, if I entered strings in the form, would it index them along with the string values on the works?

To affirm what already may be known/assumed: Most of the metadata values entered at the collection level are entered there for two reasons: faceting and/or to be inherited by works. Only the values that currently display on the front end need to display.

And apologies for my lack of understanding here -- the indexing has long been a black box. Before continuing to investigate this, would it make sense to put together a new ticket for 1) getting the values to display in sandbox and 2) doing more testing?

rschwab commented 1 year ago

@snehagunduraoUL What do you think of the recent comments here? I haven't figured this out yet, but here's the relevant commits that were made to enable this feature on the collection edit form: https://github.com/UCSCLibrary/ucsc-library-digital-collections/commits/master/app/forms/hyrax/forms/collection_form.rb

I think there's several things we need to do to resolve these issues:

  1. Test to make sure that bulkrax created collections have appropriately indexed controlled vocabulary terms. This could be done by either directly querying SOLR, or creating a test on the front-end as I suggested above.
  2. Fix the collection edit form to behave like the work edit form for controlled vocabulary terms.
  3. Actually add the subject terms to collections - using either bulkrax or the edit form depending on the results and timing of 1 and 2 above.
rschwab commented 1 year ago

This code looks relevant as well: https://github.com/UCSCLibrary/ucsc-library-digital-collections/commits/master/app/assets/javascripts/hyrax

rmjaffe commented 1 year ago

@rschwab In terms of supplying the collection-level subject terms, I was going to source those from the corresponding catalog records in UC Library Search. As far as testing to see if round tripping works a solution, I can easily plug them (string or URIs) into a spreadsheet for round tripping.

Would it make best sense to try to export the collection metadata records? Or should I create a spreadsheet with just the Hyrax IDs, the subject properties, and the values therein?

rschwab commented 1 year ago

Whichever is easiest for you. Just note that the required fields are:

Plus whatever data you're trying to change. I think for controlled vocab terms the URI would be more consistent than using a label, but I haven't done a lot of testing on that.

rmjaffe commented 1 year ago

As curious as I am about exporting, I'll just create a spreadsheet from scratch. I blocked time do that on Monday afternoon.

rmjaffe commented 1 year ago

@rschwab Round tripping spreadsheet is in the DAMS shared drive: https://docs.google.com/spreadsheets/d/1rMoYpcoAi6v_Vqd9IdLVR74jtlU4BFGA/edit?usp=sharing&ouid=114117346070027385497&rtpof=true&sd=true

rschwab commented 1 year ago

SubjectName and likely all the controlled vocabulary terms added to the collection edit form are not being indexed properly. Here is an example search demonstrating that the controlled term is not indexed for the Steve Rees collection.

Creator does appear to be indexed correctly.

rmjaffe commented 1 year ago

@rschwab Do you recommend creating another ticket for getting the subject properties to index correctly? Could whatever mechanism is enabling creator to work properly be easily applied to the subjects?

rschwab commented 1 year ago

Yes we'll need a ticket but I'm still exploring the nuance here. I just ran an import on a new collection and those were indexed correctly.

Test findings: Imported new collection with controlled terms for creator, person, and place

Exported collection

Roundtripped collection

It appears the troubles are limited to those records with the #<ActiveTriples> format for controlled terms. Perhaps the roundtrip would fix the issue. So far I'm unable to reproduce the steps to get a term in the #<ActiveTriple> format, it appears to be the format when these terms were created during a BulkOps import.

rmjaffe commented 1 year ago

Can we manually delete those ActiveTriples from the couple collections that have subject values using the editing form in the UI? Would it work to do that and then round trip to add those correct versions of those values (and all the other values) back in?

rschwab commented 1 year ago

This is broken for works according to #512 but I tested for the collection edit form on sandbox and could successfully remove an ActiveTriples value.

So yes, you can probably do that, or round tripping without using enumerated columns should also overwrite any values currently in there.

rmjaffe commented 1 year ago

@rschwab I was just playing with the collection metadata records in sandbox, but difficult to know the result of what I'm doing as there no full display of the collection metadata apart from the editing form itself. Also noting now the piece about the subjectPlace values not being indexed or faceted. We definitely want them indexed and faceted. Is this a quick edit to the configuration file, or should I create a ticket for that issue?

This ticket is still making my head spin; I booked at time for us on Thursday to talk about it -- unless we decide we don't need to!

rschwab commented 1 year ago

Here's a summary of my current understanding of this status of all issues in this thread:

  1. Adding controlled terms to collections using the edit form doesn't currently work. Ticket for this is #524
  2. Removing controlled terms from collections using the edit form does work (you just can't remove every single one of them; ticket for this is Metadata editing form: Need to be able to remove all values from non-required properties#550.
  3. Adding and removing controlled terms from collections with Bulkrax does work.
  4. There are a number of controlled terms that appear as #<ActiveTriples> in the edit form and export sheets, these are legacy values from BulkOps and should be removed.
  5. This spreadsheet, also on dams_ingest, can be used to add the subjects to collections.
  6. subjectPlace is either not being indexed or not appearing as a facet. Ticket for this is #565

Edit: Crossed out 4, these values could have been coming from some of the recent code surrounding controlled vocabularies, and may already be fixed. At minimum, they are not necessarily leftovers from BulkOps.

rmjaffe commented 1 year ago

Terms have been added in production; but if for any reason we edit and save the collection metadata records, the values will republish as active triples. Until the collection editing form is fixed, any updates to collection records must be done via round tripping.