Closed AndyElliottCRL closed 2 years ago
From Folio Inventory, I can find Keyword or Contributor "Борковский".
Folio instance hrid: in00001873924 Second of 2 records. View Source, 880 fields with Russian data are present in Folio Source.
Styling of non-roman fields when they are present can be dealt with in #33 or new more general ones. This one can be considered closed once non-roman data found in search and shows in record display.
Thanks for opening this @AndyElliottCRL. I've been trying to wrap my head around the author indexing process as well. It's possible that Native VuFind is not looking at those 880 values at index time. I think you can see all the author mapping variations used at harvest time (mapped to specific Solr fields) here:
https://github.com/Center-for-Research-Libraries/vufind/blob/crl-dev/import/marc.properties#L47
Or:
author = custom, getAuthorsFilteredByRelator(100abcqd:700abcqd,100,firstAuthorRoles)
author_variant = custom, getAuthorInitialsFilteredByRelator(100a:700a,100,firstAuthorRoles)
author_role = custom, getRelatorsFilteredByRelator(100abcd:700abcd,100,firstAuthorRoles)
author2 = custom, getAuthorsFilteredByRelator(100abcqd:700abcqd,700,secondAuthorRoles)
author2_variant = custom, getAuthorInitialsFilteredByRelator(100a:700a,700,secondAuthorRoles)
author2_role = custom, getRelatorsFilteredByRelator(100abcd:700abcd,700,secondAuthorRoles)
author_corporate = custom, getAuthorsFilteredByRelator(110ab:111abc:710ab:711ab,110:111:710:711,firstAuthorRoles|secondAuthorRoles)
author_corporate_role = custom, getRelatorsFilteredByRelator(110ab:111abc:710ab:711ab,110:111:710:711,firstAuthorRoles|secondAuthorRoles)
author_sort = custom, getFirstAuthorFilteredByRelator(100abcd:110ab:111abc:700abcd,100:110:111:700,firstAuthorRoles)
author_additional = 505r
As you can see there are lots of variations captured/indexed for use, each referencing different marc fields and/or different pre-processing methods. This page spells it out a bit better, but only the raw definitions above capture actual Marc values used. Of those variations captured, what I've been able to decipher elsewhere in the code is that Author searches use lookups against only this subset of those Solr fields:
So while I can't be totally sure just yet, it seems like the 880 is not factoring in anywhere here. I'm also seeing that the 880 is a special "linked" field. We need to unpack how VuFind and SolrMarc deals with those.
Adding project link so that we can track this
My comment above captures why I think those 880 field strings are not being searched. The fact that the 880 values are not showing up in the "Staff View" is a whole other matter... I'm not sure about that.
someone else did a thing where
https://wiki.folio.org/display/MM/2022-01-27+Metadata+Management+Meeting+notes
Actual solution code is not presented on that page, it's about all that's useful there. @ryan-jacobs is this enough to test anything?
To expand this to get any/all of the fields where non-roman data might be stored, sounds like we would have to flook at the field contents when ingest and mapping takes place. We can't say every 880 field should map to title as the snippet might imply, because 880 holds all kinds of data. 880-subfield 6 will tell us what MARC field and which occurrence in the record it matches up with. The matched field (100, 245, 250 etc. in the ex.) tells us which 880 field it goes with, same scheme.
Searches for title are also not found: "Историческая грамматика русского языка"
So here's what I can see so far in terms of current VuFind support for the 880:
So the major thing that seems to be missing is solr indexing of those 880 fields in a way that is compatible with advanced keyword searching (e.g. "author" or "title" targeted searches). That is certainly very noteworthy.
It seems that the University of Chicago has been exploring this, both as a VuFind PR (#1888) and as a public issue in thier own VuFind repo (#109). These links are probably our best option to explore next.
It looks like the "LNK" notation that's available in the SolrMac library will help us here:
https://github.com/solrmarc/solrmarc/wiki/Predefined-Custom-Methods
We can also examine some of the solr setttings that UofC is using (https://github.com/uchicago-library/vufind/tree/uc-master/import) as a guide, and possibly even reach out to them. It seems we are both trying to solve the same problems here.
Author and title are found now ; here's the same author
@AndyElliottCRL the commit in https://github.com/Center-for-Research-Libraries/vufind/commit/22a7e678fbdfd842d0985bcf389cf77db7c7f217 captures a basic potential solution here, as inspired by UofC's public VuFind implementation. The idea is to use a SolrMarc trick that resolves 880 links automatically (LNK notation). The challenge is that these links have to be explicit in our Solr configuration (i.e. we need to manually define marc field mappings that use them, they are not referenced automatically in existing marc field mappings).
My understanding is that 880s can be linked to lots of things, but as noted above, it seems like our priory is the "Author" and "Title" given that these are existing options in the targeted search.
As best I can tell, the targeted tiles searches primarily pull from 245ab so I've added a new solr mapping to also pull from any 880 links with the 245ab (title_lnk = LNK245ab
). Additionally, author searches effective pull from 100abcqd and 700abcqd, so a solr LNK mapping is in place for that as well.
All that said, this may only be the tip of the iceburg. I can see that there are many other title and author variants that are captured as sources for normal title and author searching as well (for example alternative titles may pull from 100t, 130adfgklnpst, 240a, 246a, 505t, 700t, 710t, 711t, 730adfgklnpst, 740a and author variations may pull from 111abc, 710ab, 711ab... to name a few).
So we need to decide where to draw the line between all author and title sources and those that we add support 880 links for. Perhaps this needs to be driven by our local 880 cataloging practices in some way?
This is great that a solution exists in SOLR, thanks @ryan-jacobs for all this background and actually setting it up. It's kind of annoying that every field mapping has to be explicitly set, when the 880-sub-6 will always have the pointer (to 100, occurence 1, or field 500, occurence 2, etc.). It's like the real fix is a little deeper but no one has ever had time to implement it, something like, look at 880-6 and build a (author/note, whatever) field of that type, with the vernacular data. Then no explicit remapping.
Obviously I still haven't read about the link mapping yet.
It is very excellent that we can get the author and title now, so that's the biggest part of resolving this issue. For most completeness, we want to be able to get everything that's in an 880, I don't think we draw a line between supported and non-supported 880 sources. I would see the goal as (formatting for myself):
Then the best start for this will be a look at where we have 880 fields in the CRL catalog. I'll build a list and see about extracting that content and what kind of other fields we have got, beyond authors and titles.
I think we can expect 245-A,B,C ; 246-A,B, maybe i ; 505-A,T,R ; So 245-A,B are in there ; 245-C (and 505-R) would have author name. 246-A is in the list above, and 246-B would be more alternative title.
We'll get a better list together once we see what CRL has for 880 contents.
Thanks for the comments @AndyElliottCRL , that's very helpful and provides some great confirmation.
It's kind of annoying that every field mapping has to be explicitly set, when the 880-sub-6 will always have the pointer (to 100, occurence 1, or field 500, occurence 2, etc.). It's like the real fix is a little deeper but no one has ever had time to implement it...
I agree that it would be great if the mappings automatically checked for these links at a lower level and always aggregated them into the values for the index. If it did then these general keyword searches would get populated correctly, but I suppose other solr queries would break, specifically the cases where the system needs to query just the base marc data or just the linked data separately.
Anyway, it looks like we have an imperfect solution that could get us most of the way there, but lets also see if EBSCO has any comments on this.
Exported fields from catalog and and de-duped, results in Folio Team--VuFind--Files--issue_31_marc_field_crl_uses_880_for.xlsx
CRL uses 880s to transliterate data from at least 53 different MARC fields. More than I thought, and only a couple I would question supporting here. We can ignore 037, and 440 will be increasingly rare.
Just a quick note from our end, I checked with the other on our team that works with VuFind regularly and mapping the 880's explicitly sounds like the best bet. Certainly feels cumbersome, but we're not aware of an easier way to facilitate this in VuFind.
University of Chicago is showing the imprint (publisher) data (260 / 880 field combo) in
We see the small Russian title (CRL got that going already) and the Russian version with "Imprint" label, that we don't have.
It looks like the indexing problem is (mostly) solved with the custom LNK notation in our solr field mappings. Let's break-off the record display considerations in another issue (#65)
Test record: http://catalog.crl.edu/record=b2836533~S5 English and Russian data versions to search on: OCLC No. 643765209 Author
Title
Author search 2022-03-30, not found in Russian Author search 2022-03-30, found in English. Record in VuFind 2022-03-30: no Russian fields to be found. This is highly undesirable.
Russian data is in CRL's 880 fields, which are linked to the MARC tag and occurrence number of that tag in 880 subfield 6. 880 fields with Russian data are absent from VuFind MARC View ("Staff View").