UMNLibraries / cdm-blacklightify

Blacklightify a CONTENTdm collection
MIT License
3 stars 0 forks source link

Subjects not displaying properly #100

Open theberg75 opened 3 months ago

theberg75 commented 3 months ago

We've been doing a lot of subject heading clean-up in CDM but we're noticing some display issues after re-indexing, specifically with punctuation. Example:

Subject heading in CDM: Moorland, Jesse Edward, 1863-1940

Display on UMedia: Moorland, Jesse Edward, 1863 1940 Link: [https://umedia.lib.umn.edu/item/p16022coll261:557?facets%5Bsubject_ss%5D%5B%5D=Moorland%2C+Jesse+Edward%2C+1863+1940]

We have other headings that are using hyphens in certain cases too, and are not showing.

This display issue is found in both the "subjects" section of the item-level metadata view and in the "browse" by subjects section. I know the "browse" section probably needs to wait for the entire set to be re-indexed, but usually when I update subjects in CDM, I see the item-level view change via the regular overnight indexing schedule.

theberg75 commented 3 months ago

Also, probably a stupid question but want to make sure that this won't affect the date search logic we discussed setting up in a previous ticket?

Date ranges in the Date Created field will be formatted with a space in between the hyphen separator, e.g. 1948 - 1952 But date ranges within subject headings do NOT have the space in between, e.g., Harry Truman, 1948-1952

mberkowski commented 3 months ago

More investigation needed why this display happens, reindexing the collection didn't help. It's also in the JSON view, and is in the raw Solr record suggesting it happens during CDMDEXER record transformation. The full item detail api call from contentdm has the correct value and there are other fields (date fields) where we have code that is intolerant of some hyphenated date ranges in metadata & tries to normalize them. But that isn't applied to subject fields so not clear yet why these records are like they are.

mberkowski commented 3 months ago

Also, probably a stupid question but want to make sure that this won't affect the date search logic we discussed setting up in a previous ticket?

Not a stupid question. Subject field ranges should not affect date searches, because those are conducted against only the true date fields.