emory-libraries / blacklight-catalog

1 stars 2 forks source link

Spike: how can we match electronic-only holdings to Libraries for the Library facet? #856

Closed eporter23 closed 3 years ago

eporter23 commented 3 years ago

Question: how can we capture the holding library of an electronic-only offering as part of our SOLR indexing process?

Scenario: The main requirement for the Library facet is to allow users to focus their search on materials offered by a specific Library. WHSCL identified an issue when testing to see how ejournals provided by the WHSCL library were identified when using the following facet combination (sample search):

The resulting list was much smaller than expected. We suspect that is because we get the Library affiliation(s) for a given record from the HOL852 field in the OAI. HOL852, however, does not seem to include electronic holdings info (AVE from the bib API, which is what we use to show online access options). While we can get from the bib API, we need to have that information as part of our SOLR index for the facet to work as intended.

Examples In the example search linked above, the results seem to only include print journals that also have an online option.

https://blackcat-test.library.emory.edu/catalog/990011631240302486 https://blackcat-test.library.emory.edu/catalog/990027461130302486 https://blackcat-test.library.emory.edu/catalog/9936776082402486

Spike goals:

eporter23 commented 3 years ago

@lovinscari do you have any specific titles that Bonnie mentioned for this issue?

bwatson78 commented 3 years ago

@eporter23 Just a thought off the top of my head: since we control the 998 fields that describe a record's online details, couldn't we create a subfield that contained the holding library's code, pulled from the same batch of info we use to feed that field?

bwatson78 commented 3 years ago

Awaiting publishing of updated blacklightemily to production.

bwatson78 commented 3 years ago

I've exhausted all of Application Development's options to reliably retrieve Online-only Library information. The field denoting online attributes (998) does pass along Library information (subfield c) but it has to be populated, which records like 9936778405002486, 9936495904002486, 9936493899002486, 9936494877202486, and 9936493955102486 lack. As mentioned above, the HOL852 field may pass this along, but is not present in either availability api calls (https://api-na.hosted.exlibrisgroup.com/almaws/v1/bibs?mms_id=9936778405002486,9936495904002486,9936493899002486,9936494877202486,9936493955102486&view=full&expand=p_avail,e_avail,d_avail&apikey=l7xx0ee57af8e73842c695c32fe1f9a4ca59) or oai_set calls. If there is any significance to the fact that, in the Availability API calls only, the AVEc field is populated with a MMS_id-style of a number (i.e. 61271664270002486) and not a library code (i.e. UNIV), then we could work with that. Yet, bear in mind, that seems only provided in the availability API response and would have to be queried the same time we do ingest, which would hurt us by 1. making a big dent in our API call limits and 2. making our ingests take much, much longer to complete. @AGCooper has asked me to hold off on further progress on this until Lisa returns from vacation, so these findings are just preliminary.

AGCooper commented 3 years ago

@eporter23 @bwatson78

A quick test in discovere and it looks like it's not doing what Bonnie from Health seems to think it's doing

when I facet by online AND Health Sciences I only see results that are both physical and online

So my first thought is the online facet is finding portfolios and the library facet is looking for holding records for physical items

Also it doesn't look like that Library field in the portfolio is indexed in discovere / primo

And it looks like a lot (if not all) of the results in primo for online AND Health are dedupped results (i.e. Alma has 2 records that primo has merged into one, one e one p)

And 1 more thing: the one example I just dug into that was a dedup that shows when you facet by online and hlth in primo, in alma it is actually 2 records: the physical is held by HLTH and the electronic has the portfolio designated as UNIV ... (edited)

http://discovere.emory.edu/discovere:default_scope:01EMORY_ALMA51462013500002486

I looked at journals too and it looks like the same thing is happening. BTW.

AGCooper commented 3 years ago

Also Lisa checked and straight up e-records with portfolios don't (usually) have holding records just portfolios, and a lo of those don't have the Library field filled out. (In Alma that is.)

lovinscari commented 3 years ago

Thanks to you all - @eporter23, @bwatson78, and @AGCooper. I am taking this information and will craft an email response to Bonnie updating her on your findings.