IUBLibTech / newton_chymistry

New version of 'The Chymistry of Isaac Newton', using XProc pipelines to generate a website based on TEI XML encodings of Newton's alchemical manuscripts, and Apache Solr as a search engine.
2 stars 0 forks source link

Include all TEI content in search index #8

Closed Conal-Tuohy closed 4 years ago

Conal-Tuohy commented 5 years ago

done except for bibliography

Should users be able to search for the bibliography as a document in its own right?

mdalmau commented 5 years ago

We discussed the complications to displaying a mixed set of results (much like the HTML indexing issue) and have decided that we don't need to search the bibliography separately. In other words, we don't need the bibliography entry to show up as a separate result. We would like for the author(s) and titles associated with each bibl indexed so that the user can get to the entry via the manuscripts. Does this make sense?

Conal-Tuohy commented 5 years ago

I'm not sure if I do understand that requirement. In the manuscripts, the bibliographic references are encoded with <bibl> elements which can have a @corresp attribute linking them to the corresponding full reference in the bibliography. e.g.

<bibl corresp="CHYM000001.xml#Zetzner1660"><abbr>Consil</abbr>
                            <abbr>conjug</abbr> p 435, 437</bibl>

In the HTML, these bibliographic references are rendered as hyperlinks to the entry in the bibliography. e.g. http://carbon.dlib.indiana.edu:8220/text/ALCH00001/normalized

Is that all you're after, @mdalmau? or is there something else?

mdalmau commented 5 years ago

What I would like to see may involve too much wizardry. I was hoping that a user could get a hot on Zetzner, for example, even though we don't bring in the author information in the bibl, but instead need to grab that info by tracing the @corresp to the full citation. The most explicit way to have done this in the TEI for each mss, would have been been to include in the bibl tag either as attribute or embedded elements a normalized version of the author and title. Is there some "easy" way to do this through indexing? So that each bibl has an author and title associated with it so that when a user searches for "Zetzner" (assuming that's an author), they would get a hit on the page?

Conal-Tuohy commented 5 years ago

Yes, I'm sure something is possible, though I think, given the time remaining, we will be constrained to work within the general framework of the search system as it's implemented so far.

The general "text" field in the Solr index is an amalgam of the three fields which contain the plain text content of the three web pages (i.e. intro, diplomatic, and normalized) for each ms. These are the fields to which hit-highlighting is applied. So far I've been careful to ensure that that text content of these fields isn't broken up by the insertion (or transclusion) of extraneous material (which is all stored inside attributes in the HTML, and hence is discarded when the page is indexed). I'd still like to stick to that as a principle, if possible, since it will otherwise, I think, produce some odd results in hit highlighting.

But it would be easy to add an extra field for bibliographic searching. There's currently a field called "title" which contains the title of each ms; we could add a field e.g. "citation", in which people could enter text from a bibliographic entry and find ms that cite that entry.

We could also add a "bibliographic reference" facet, that would list the author/title entries?

mdalmau commented 4 years ago

I think we need to table this, obvs. We still have a few pending bugs and production deployment tasks and those are the priority for now. I will tag this for future development.