Open mdalmau opened 3 years ago
The text
search field is handled differently to the other fields.
The text
field is defined in Solr as a copyField
; i.e. it is essentially a virtual field; defined as an aggregation of certain other fields. Specifically, the text
field is defined to be a copy of the introduction
, normalized
, and diplomatic
fields.
https://github.com/IUBLibTech/newton_chymistry/blob/master/xslt/update-schema-from-field-definitions.xsl#L63-L88
So if you search for something in the text
field, Solr will find where your keywords appear in any one of those fields. For the advanced search/browse page, I think it's probably worth retaining a text
field which specifically includes only the actual text of the MS, and excludes repository numbers, etc.
But a solution might be to define another Solr copyField
called all
or something, and configure it to be a copy of those three text fields and also every field in the search-fields.xml
file? Then the "quick" search box that appears in the page header could use that all
field instead of the text
field.
This is very helpful. Thanks, Con. The "global search" box in the current/old site has always functioned as a very broad "keyword" search so I do think we need to mimic that functionality there -- index everything. I'll see what the team thinks about leaving the "Text" field to just intro, norm, diplo.
@mdalmau I've been doing a deep-dive on how the Solr document is created from XPath targets in TEI docs. So I think I know what I need to do to technically to broaden up the global search, but after looking at the TEI I'm just verifying that it's correct to just throw descendant text nodes from teiHeader
into a searchable text field. There are lots of things in there that would make for mysterious search results if the user doesn't see the context of the search hit, which they wouldn't. The UI only ever presents metadata or highlighted hits from the transcriptions. For example, what would the experience be if the user got hits on things like names in <respStmt>
, especially in the likelihood that a particular person's name is probably in all the documents (i.e. Baker?)
Thinking about that is not a blocker for me right now because I still have to figure out how to create the new field and then how to rewire the search mechanisms in the UI, but once I get it working we will need to look at the impact it has.
Hey @randalldfloyd another option might be to modify the p5-to-html.xsl
stylesheet or one of the stylesheets in that pipeline that produces the three HTML renditions (i.e. the diplomatic, normalized, and introduction pages), so that it does include the relevant content from inside the teiHeader
rather than just the TEI text
. Then users could search and find the metadata and see it highlighted.
Perhaps you could render that teiHeader
content inside an HTML details
element so that it's tucked away, by default; though if you did we'd probably need a small tweak to the pipeline so that it puts the details
element into an open
state if it contains a hit. That would most easily be done by inserting an add-attribute
step to the hit-highlighting pipeline right after the hit-highlighting step has added the hit-highlight mark
elements:
<!-- open up any html:details element which contains a search hit highlight -->
<p:add-attribute match="html:details[.//html:mark]" attribute name="open" attribute-value="open"/>
Bill reported that he was unable to search for Keynes 12 in the global search box in the home page. The same is true for the "Text" field in Advanced Search. We need to see how @Conal-Tuohy setup the indexing for the "Text" field (and if that's the same indexing criteria for the global search box). We want that global search and the "text" field to search the whole TEI document (my guess it's just indexing the body? I am not sure.).