EpiDoc / EFES

EFES (EpiDoc Front End Services) is a custom and readily customizable platform for publication and search/indexing of EpiDoc files, based on the Kiln platform
Apache License 2.0
31 stars 38 forks source link

Add lemmatised text from TEI markup to search #30

Closed ajenhl closed 6 years ago

ajenhl commented 7 years ago

Add content from tei:w/@lemma and dereferenced tei:name/@nymRef to the Solr index and allow that to be searched over.

ajenhl commented 7 years ago

This seems like something that is potentially useful for any TEI text, not just Epidoc, and that the indexing process should always index a lemmatised form (which, in some cases, will be the same as the non-lemmatised form). This removes the need for any indexing configuration (at this stage).

ajenhl commented 7 years ago

What, if anything, do we wish to do with tei:w/@lemmaRef?

gabrielbodard commented 7 years ago

Re @lemmaRef: if present, treat it like @nymref — i.e. try to dereference it, and fetch the headword if found.

The difference being, as far as I can see, @nymRef points to the xml:id of a <nym> element whose headword can be find in /tei:form/tei:orth, whereas @lemmaRef is not (according to the TEI Guidelines) normally suggested to point to a tei:entry, but rather to "an online lexicon", whose format we can of course not second guess.

Short answer: in the short term, either do nothing, or test for whether both @lemmaRef and @nymRef point to a TEI file, and if not, ignore them. In the medium term we'll query the Markup list for more robust suggestions.

gabrielbodard commented 6 years ago

I think @lemmaRef is not implemented, and so should be ignored for now. Anything else outstanding on this ticket? If not, close it.

ajenhl commented 6 years ago

Non-local @nymRef values are not supported, but supporting that (and any other 'need the external content at the time of indexing' references) would require a change to the indexing process (to operate on an aggregation of the source document and every fetched document it references).

The references to authority lists, by the way, are fine, because we just store the reference, not the referenced content (which is pulled in during display).