IUBLibTech / newton_chymistry

New version of 'The Chymistry of Isaac Newton', using XProc pipelines to generate a website based on TEI XML encodings of Newton's alchemical manuscripts, and Apache Solr as a search engine.
2 stars 0 forks source link

Global Search and Adv Search for "Text" is not searching everything #90

Open mdalmau opened 3 years ago

mdalmau commented 3 years ago

Bill reported that he was unable to search for Keynes 12 in the global search box in the home page. The same is true for the "Text" field in Advanced Search. We need to see how @Conal-Tuohy setup the indexing for the "Text" field (and if that's the same indexing criteria for the global search box). We want that global search and the "text" field to search the whole TEI document (my guess it's just indexing the body? I am not sure.).

Conal-Tuohy commented 3 years ago

The text search field is handled differently to the other fields.

The text field is defined in Solr as a copyField; i.e. it is essentially a virtual field; defined as an aggregation of certain other fields. Specifically, the text field is defined to be a copy of the introduction, normalized, and diplomatic fields. https://github.com/IUBLibTech/newton_chymistry/blob/master/xslt/update-schema-from-field-definitions.xsl#L63-L88

So if you search for something in the text field, Solr will find where your keywords appear in any one of those fields. For the advanced search/browse page, I think it's probably worth retaining a text field which specifically includes only the actual text of the MS, and excludes repository numbers, etc.

But a solution might be to define another Solr copyField called all or something, and configure it to be a copy of those three text fields and also every field in the search-fields.xml file? Then the "quick" search box that appears in the page header could use that all field instead of the text field.

mdalmau commented 3 years ago

This is very helpful. Thanks, Con. The "global search" box in the current/old site has always functioned as a very broad "keyword" search so I do think we need to mimic that functionality there -- index everything. I'll see what the team thinks about leaving the "Text" field to just intro, norm, diplo.

randalldfloyd commented 3 years ago

@mdalmau I've been doing a deep-dive on how the Solr document is created from XPath targets in TEI docs. So I think I know what I need to do to technically to broaden up the global search, but after looking at the TEI I'm just verifying that it's correct to just throw descendant text nodes from teiHeader into a searchable text field. There are lots of things in there that would make for mysterious search results if the user doesn't see the context of the search hit, which they wouldn't. The UI only ever presents metadata or highlighted hits from the transcriptions. For example, what would the experience be if the user got hits on things like names in <respStmt> , especially in the likelihood that a particular person's name is probably in all the documents (i.e. Baker?)

Thinking about that is not a blocker for me right now because I still have to figure out how to create the new field and then how to rewire the search mechanisms in the UI, but once I get it working we will need to look at the impact it has.

Conal-Tuohy commented 3 years ago

Hey @randalldfloyd another option might be to modify the p5-to-html.xsl stylesheet or one of the stylesheets in that pipeline that produces the three HTML renditions (i.e. the diplomatic, normalized, and introduction pages), so that it does include the relevant content from inside the teiHeader rather than just the TEI text. Then users could search and find the metadata and see it highlighted.

Perhaps you could render that teiHeader content inside an HTML details element so that it's tucked away, by default; though if you did we'd probably need a small tweak to the pipeline so that it puts the details element into an open state if it contains a hit. That would most easily be done by inserting an add-attribute step to the hit-highlighting pipeline right after the hit-highlighting step has added the hit-highlight mark elements:

<!-- open up any html:details element which contains a search hit highlight -->
<p:add-attribute match="html:details[.//html:mark]" attribute name="open" attribute-value="open"/>