DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP
http://dclp.github.io/dclpxsltbox/
1 stars 5 forks source link

Invalid XML <div type="figure"> #106

Closed Edelweiss closed 7 years ago

Edelweiss commented 9 years ago

Value of attribute "type" is invalid; must be equal to "apparatus", "bibliography", "commentary", "edition", "textpart" or "translation"

e.g.

https://github.com/DCLP/idp.data/blob/dclp/DCLP/65/64120.xml

<div type="figure">
   <p>
      <figure>
         <graphic url="www.papyri.info/apis/michigan.apis.2353"/>
      </figure>
   </p>
</div>

could be changed into:

<div type="bibliography" subtype="figure">
   <p>
      <figure>
         <graphic url="www.papyri.info/apis/michigan.apis.2353"/>
      </figure>
   </p>
</div>

This also affects HGV, e.g.

https://github.com/DCLP/idp.data/blob/dclp/HGV_meta_EpiDoc/HGV111/110164.xml

paregorios commented 9 years ago

Can tei:facsimile not be used for this purpose?

paregorios commented 9 years ago

Here's an example of the use of tei:facsimile from the Campa Inscriptions project. tei:facsimile goes between the tei:teiHeader and tei:text (it is a child of the root element).

<facsimile>
    <graphic xml:id="fac1" url="../images/inscriptions/C0087_AG_2009.jpg">
        <desc>Front view of the stela bearing inscription <ptr target="#inv-general"/>. Taken at the Museum of Cham Sculpture by Arlo Griffiths on <date when="20090920"></date>.</desc>
    </graphic>
    <graphic xml:id="fac2" url="../images/inscriptions/EFEOB-est.n0164_A.jpg">
        <desc>Photograph of EFEO estampage n. 164, face A.</desc>
    </graphic>
    <graphic xml:id="fac3" url="../images/inscriptions/EFEOB-est.n0164_B.jpg">
        <desc>Photograph of EFEO estampage n. 164, face B.</desc>
    </graphic>
</facsimile>
jcowey commented 9 years ago

Where is it that the type attributes are constrained to "apparatus", "bibliography", "commentary", "edition", "textpart" or "translation" ? Out of interest, because HGV_meta_XML and the whole PN system is not protesting. Am I failing to understand something ? Most probably.

paregorios commented 9 years ago

I'll investigate what's going on with the schema and in PE/PN.

Edelweiss commented 9 years ago

It might be me who got it wrong. I always get confused with TEI, EpiDoc, latest EpiDoc and the various servers where the xml defintions are hosted. See here an exerpt from the schema definition for the tei:div tag from http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng

<define name="tei_div">
      <element name="div">
         <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">(text division) contains a subdivision of the front, body, or back of a text. [4.1. ]</a:documentation>
         <-- ... -->
         <ref name="tei_att.global.attributes"/>
         <ref name="tei_att.divLike.attributes"/>
         <ref name="tei_att.typed.attribute.subtype"/>
         <ref name="tei_att.declaring.attributes"/>
         <attribute name="type">
            <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"/>
            <choice>
               <value>apparatus</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">to contain apparatus criticus or textual notes</a:documentation>
               <value>bibliography</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">to contain bibliographical information, previous publications, etc.</a:documentation>
               <value>commentary</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">to contain all editorial commentary, historical/prosopographical discussion, etc.</a:documentation>
               <value>edition</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">to contain the text of the edition itself; may include multiple text-parts</a:documentation>
               <value>textpart</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">used to divide a div[type=edition] into multiple parts (fragments, columns, faces, etc.)</a:documentation>
               <value>translation</value>
               <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">to contain a translation of the text into one or more modern languages</a:documentation>
            </choice>
         </attribute>
         <empty/>
      </element>
   </define>

Perhaps the reason for why it is still working in SoSOL is because the PE interface is based on an outdated version of the EpiDoc schema.

paregorios commented 9 years ago

@hcayless can you comment on the above question?

paregorios commented 9 years ago

pinging @hcayless

hcayless commented 9 years ago

It looks like this was missed when EpiDoc removed div[@type='figure'] a couple of years ago. Yes, you should be using tei:facsimile. And I have a bunch of work to do. Sigh.

On Fri, Jan 16, 2015 at 1:35 PM, Tom Elliott notifications@github.com wrote:

@hcayless https://github.com/hcayless can you comment on the above question?

— Reply to this email directly or view it on GitHub https://github.com/DCLP/dclpxsltbox/issues/106#issuecomment-70299771.

paregorios commented 9 years ago

So @Edelweiss @jcowey it looks to me like DCLP, for fullest EpiDoc conformance, should move to using tei:facsimile, with the expectation that legacy encodings already in papyri.info will follow suit at some stage. We thereby (meaning me) are signing up to verify and extend XSLT support to handle them. I'm assigning this ticket back to @Edelweiss to see if you concur. We can this discuss an implementation plan.

paregorios commented 9 years ago

I'm putting this ticket on the backlog pending a decision about when to implement and who should implement the fix across existing files. I will add it to list of issues to discuss 20 May 2015.

paregorios commented 9 years ago

keep track of what we did so we can share the recipe back to papyri.info where the old encoding still obtains.

HolgerEssler commented 9 years ago

I wonder whether we should try to unify these references to images and those within <custEvent> in #128.

paregorios commented 9 years ago

I'm taking this ticket temporarily while I ponder @HolgerEssler's suggestion above.

jcowey commented 8 years ago

Example with absolute url:

<facsimile>
    <graphic url="http://beinecke.library.yale.edu/papyrus/oneSET.asp?pid=4514"/>
</facsimile>

Example with enumeration:

<facsimile>
    <graphic n="1" url="…"/>
    <graphic n="2" url="…"/>
    <graphic n="3" url="…"/>
</facsimile>
Edelweiss commented 8 years ago

Another possibility might be to collect all references to images - be it online or printed material - in one div

<div type="bibliography" subtype="illustrations">
    <p>
       <bibl type="illustration">ed. princ.</bibl>
       <figure>
          <graphic url="http://smb.museum/berlpap/index.php/01681/"/>
       </figure>
    </p>
 </div>

TM No. 62107

Edelweiss commented 8 years ago

Decision. All graphics contained within div type="figure" will be moved to facsimile section. Then the files will vaildate. If later we decide to change then all we have to do is reverse the commit or run another xslt change over it.

HolgerEssler commented 8 years ago

For Anagnosis we have inserted all references to images to , since it was at a specific moment in time when the image was taken - and this gives an easy way to attach name, date etc. to it.

leoba commented 8 years ago

I'm going to ask @paregorios to ping me here if he needs me to modify the xslt to handle the new facsimile section.

jcowey commented 8 years ago

As facsimile is completely new to the code, it seems to me to follow that the xslt will have to be modified.

paregorios commented 8 years ago

@leoba yes, I think this is one we need to do. I've shared a private repository (Campa Inscriptions) with you. It includes a fork of the EpiDoc stylesheets used to produce the HTML for the inscriptions that one sees here: http://isaw.nyu.edu/publications/inscriptions/campa/. It caters for facsimile, though perhaps not in the way we're doing it here.

Only reason the repos is private is because there are EFEO embargoed images therein (it's a todo to break that out). In any case, you should be able to see it if logged into github.

You can find the relevant bits perhaps with a search like https://github.com/paregorios/Campa-Epigraphy/search?l=xslt&q=facsimile&utf8=%E2%9C%93

jcowey commented 8 years ago

Now committed: https://github.com/DCLP/idp.data/commit/9a269087f33d228a12d17ad27051528ad9663935 and can be viewed on https://github.com/DCLP/idp.data/tree/hd

paregorios commented 8 years ago

I'm taking this ticket temporarily to have a look at the now-valid XML.

jcowey commented 8 years ago

possibly now tending to prefer the

<div type="bibliography" subtype="illustrations">
    <p>
       <bibl type="illustration">ed. princ.</bibl>
       <figure>
          <graphic url="http://smb.museum/berlpap/index.php/01681/"/>
       </figure>
    </p>
 </div>

solution because the urls are often .htm / .html sites or jpgs not controlled or strictly speaking part of the project

jcowey commented 8 years ago

Now

<div type="bibliography" subtype="illustrations">
   <listBibl>
      <bibl type="illustration">ed. princ.</bibl>
      <bibl><ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/></bibl>
   </listBibl>
</div>

is probably preferable. Other possibilities would be

<div type="bibliography" subtype="illustrations">
   <p>
      <bibl type="illustration">ed. princ.</bibl>
      <bibl><ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/></bibl>
   </p>
</div>

OR

<div type="bibliography" subtype="illustrations">
   <p>
      <bibl type="illustration">ed. princ.</bibl>
      <ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/>
   </p>
</div>

I think bibl inside listBibl seems better. And ptr within bibl also better.

paregorios commented 8 years ago

What about something like:

<div type="bibliography" subtype="illustrations">
  <listBibl>
    <bibl type="illustration">ed. princ.
      <ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/></bibl>
  </listBibl>
</div>

The ptr is pointing to an illustration in the ed. princ., no? Therefore a single bibliographic reference, rather than two separate ones. Or am I confused?

You are confused. We are talking about separate entities.

<bibl type="illustration">ed. princ.</bibl> is a printed illustration (photo, microfiche or the like). <ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/></bibl> is an online jpg. That is why list bibl makes sense to me. We are listing "bibliographic" entities. Printed (old style) illustrations; online images; online HTML pages which include images.

jcowey commented 8 years ago

For the above reasons I like:

<div type="bibliography" subtype="illustrations">
   <listBibl>
      <bibl type="illustration">ed. princ.</bibl>
      <bibl><ptr target="http://bibd.uni-giessen.de/papyri/images/pbug-inv115recto.jpg"/></bibl>
   </listBibl>
</div>
paregorios commented 8 years ago

Right. That makes better sense. The only additional questions I have is whether there's existing data that could be used to:

jcowey commented 8 years ago
paregorios commented 8 years ago

I think we have a consensus here.

paregorios commented 8 years ago

@jcowey I think we should discuss status on this issue

Edelweiss commented 8 years ago

Currently we have a tei:div with its attribute @subtype set to ‘illustrations' which comprises two types of tei:bibl elements, printed images and links to online resources.

Here an example: https://github.com/DCLP/idp.data/blob/3b8d73b0f6c8571e4d6d5a188aa3bd09ff06f545/DCLP/63/62913.xml

While printed images are marked by having their @type attribute set to ‘illustration’ there is no such @type attribute for online images.

Today we discussed that it might make sense to have a type for both of these image resources, marking them as ‘printed’ and ‘online’ respectively.

The example above could look like this:

<div type="bibliography" subtype="illustrations">
    <listBibl>
        <bibl type="printed">P.Köln 7, pl.IXc and d</bibl>
        <bibl type="printed">BICS 22 (1975), pl.I-III</bibl>
        <bibl type="printed">ZPE 109 (1995), pl.IX</bibl>
        <bibl type="online">
            <ptr target="http://www.csad.ox.ac.uk/POxy/papyri/vol49/pages/3450.htm"/>
        </bibl>
        <bibl type="online">
            <ptr target="http://www.csad.ox.ac.uk/POxy/papyri/vol57/pages/3885.htm"/>
        </bibl>
        <bibl type="online">
            <ptr target="http://www.uni-koeln.de/phil-fak/ifa/NRWakademie/papyrologie/Karte/VII_304.html"/>
        </bibl>
        <bibl type="online">
            <ptr target="http://www.ville-ge.ch/fcgi-bin/fcgi-axn?launchpad&amp;/home/minfo/bge/papyrus/pgen257-ri.axs&amp;550&amp;550"/>
        </bibl>
        <bibl type="online">
            <ptr target="http://www.ville-ge.ch/fcgi-bin/fcgi-axn?launchpad&amp;/home/minfo/bge/papyrus/pgen257-vi.axs&amp;550&amp;550"/>
        </bibl>
    </listBibl>
</div>

(https://github.com/DCLP/idp.data/blob/03ca04ee892424573d39155b3768e618494f8ace/DCLP/63/62913.xml)

This change would also ease the work on the editor because there is a clearer distinction between printed and online resources.

Edelweiss commented 7 years ago

example file old

/TEI/text/body/div[@type="figure"]/p/figure/graphic/@url
/TEI/text/body/div[@type="bibliography"][@subtype="illustrations"]/p/bibl[@type="illustration"]

example file new

/TEI/text/body/div[@type='bibliography'][@subtype='illustrations']/listBibl/bibl[@type='online']
/TEI/text/body/div[@type='bibliography'][@subtype='illustrations']/listBibl/bibl[@type='printed']/ptr/@target
paregorios commented 7 years ago

This has been addressed with commits up through navigator/issue106/c899ae4. Consider the following examples (NB: over half the links I tried clicking resulted in "not found" or other server errors, or DNS errors, but the HTML contains the URLs as encoded in the XML):

Over to @jcowey, @rla2118, and @HolgerEssler for review.

paregorios commented 7 years ago

Note also that there are thousands of bibl tags still with type="illustration" instead of "printed" or "online" scattered throughout the hd branch of idp.data. The code currently treats them the same as type="printed", passing them through as plain text, but it represents inconsistent encoding.