EpiDoc / EFES

EFES (EpiDoc Front End Services) is a custom and readily customizable platform for publication and search/indexing of EpiDoc files, based on the Kiln platform
Apache License 2.0
31 stars 38 forks source link

handling external authority file #60

Open emylonas opened 2 years ago

emylonas commented 2 years ago

The US Epigraphy Project (USEP) manages controlled vocabularies for characteristics like genre, material, type of object and several others using an external taxonomy file. The taxonomy file has lists of values, each with its own @xml:id, and with a human readable display value. You can see the taxonomy file here. The taxonomy file is incorporated into the inscription xml using xi:include. It would also be possible to access it with document() or doc(). In the TEIHeader these vocabularies are called as follows:

<objectDesc ana="#slab">
                            <supportDesc ana="#stone.marble">

where #slab refers to an element in the taxonomy file. Full USEP inscription here.

In EFES, we were unable to make this work using the xi:include, so tried using document() and later, doc(). It became clear that EFES, that is Cocoon, was unable to find the file. It was possible to make document() work using a full file path from root of the local computer, although we couldn't get the second parameter, the xpath leading to a node, to work unless it was /.

We then tried to use the xpath function doc() to give us the root of the file and write an xpath based on that. This started to generate Cocoon errors. It seems possible that document() is an XSLT function and so doesn't have access to the xmaps, but doc() as an xpath function does, and this was throwing errors.

I think we need help mapping our taxonomy file - in main.xmap? so it is accessible within Cocoon. It also has to be accessible inside the XSLT stylesheet.

Another approach might be to use a system variable, again inside the XSLT, so that we can specify a path either from server root or Cocoon root.

In any case, this is not an uncommon way to handle controlled vocabularies, and it is different from the structures in current projects using EFES. It has to be handled in order for us to display and probably also to index our inscriptions.

In the xslt style sheet htm-tpl-struct-usep.xsl, we have the following:

<b><i18n:text i18n:key="epidoc-xslt-usep-object">Object Type</i18n:text>: </b>
         <xsl:choose>
           <xsl:when test="//t:teiHeader/t:fileDesc/t:sourceDesc/t:msDesc/t:physDesc/t:objectDesc/@ana">
             <xsl:variable name="taxonomy-item" select="substring-after(//t:teiHeader/t:fileDesc/t:sourceDesc/t:msDesc/t:physDesc/t:objectDesc/@ana, '#')"/>
             <xsl:value-of
               select="doc('/inscriptions/authority/include_taxonomies.xml')//*[@xml:id=$taxonomy-item]"
             />

which generates the following error: Screen Shot 2021-11-01 at 10 19 31 PM

This leads us to believe that we are now having file path mapping problem. Happy to clarify or provide more information.

ajenhl commented 2 years ago

Is there any chance you can make your installation (with modified code and content XML) available for me to reproduce this, please?

emylonas commented 2 years ago

you have an invitation to the repository which is closed at the moment. Relevant files are the XML files, which contain the xi:include (in //ROOT/content/xml/epidoc - ours are the MA.Camb.HU... files) the include_taxonomies.xml file, which is the included file (//ROOT/content/xml/epidoc and //ROOT/content/xml/authority - trying in several places) Note that include_taxonomies isn't a pre-compiled authority file like the authority files for IOSPE et al., it's intended to provide consistency for entering and displaying metadata. So it's used by our display XSL.

the usep stylesheet is at //ROOT/content/stylesheets/htm-tpl-struct-usep.xsl

I've made changes to the map files in //ROOT/sitemaps

emylonas commented 2 years ago

This will also come up when we try to index our values, so we are grateful for any further information on how to handle that.

ajenhl commented 2 years ago

Thank you!

The error message is because a plain string within the doc function is being interpreted with the cocoon protocol (as per the error message). You can reference files on disk using the file protocol (eg, file:///path/to/file), but that requires an absolute path which is not portable.

There are a few options here, and you listed a few of them yourself. Since the authority file is integral to the content documents in various contexts (display and indexing at least), I would put the solution as close to the reading of those content documents as possible. There are preprocessing pipelines that already exist for display (internal.xmap#local-preprocess-epidoc-language) and indexing (solr.xmap#local-solr-preprocess) that you can modify to include a <map:transform type="xinclude"/> step after the map:generate. This will include the authority file so that it is accessible as part of the document by the display and indexing processes.