erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

How to avoid the schema error: element "bibl" incomplete; missing required element "ptr" #256

Open danbalogh opened 9 months ago

danbalogh commented 9 months ago

The schema now raises an error when a <bibl> element has no <ptr> child. This is all right, but it creates a problem that I think is not unique to my subcorpus. Sometimes, there exists no secondary bibliography and/or no primary bibliography for an inscription (or none is encoded yet), but I would prefer to keep the skeleton of that section in the XML file, in case it can be populated later. My solution so far has been to use

  <listBibl type="primary">
    <bibl/>
  </listBibl>

where the empty <bibl/> element is necessary because without it, the earlier schema also raised an error. But now the above is also flagged as an error. I could imagine the following solutions, but I don't know which if any are most feasible:

What do you think, @michaelnmmeyer ?

I should add that I've just checked our inscription templates, and the use of empty <bibl/> (or <bibl n="siglum"/> in case of the primary bibliography) is present there too, so at the moment, even our template is in conflict with the schema.

arlogriffiths commented 9 months ago

I don't think this is new. It has been years that I have been annoyed at having to remove or comment out altogether the <div type="bibliography"> when I do not not have any references to encode or want to postpone that part of the encoding work.

danbalogh commented 9 months ago

You may have had something different? The absence of <bibl/> has long been noted as an error, but I'm sure I've had no schema complaints for the snippet cited above, until recently. Also, the last time I looked through Michael's list of files with encoding errors, I corrected all errors then flagged in my files - and now there are dozens of my files shown as having errors, because of this. But anyway, whether old or new, we need a way to keep empty bibliographies in the file. Commenting out is also acceptable to me, but I'd like us to agree on the "proper" way.

michaelnmmeyer commented 9 months ago

For cases like that, I am in favor of the "comment things out" option.

The main problem is that TEI grammars do not allow you to express context-dependent rules. You cannot say, for instance, that an element must have an attribute X in some context and an attribute Y in another. You need to allow both attributes and add extra code later on to sort things out.

This is sometimes inevitable, but it is better avoided whenever possible. The more you do it, the more your schema looks like code, the less "declarative" (viz. static, inert, unlike a program) it becomes. This makes it harder to reason about, and this makes the contextual help generated by Oxygen (and the TEI documentation, see e.g. https://dharman.in/documentation/inscription) less useful.

@arlogriffiths All modern editors have keyboard shortcuts to comment/uncomment things. In Oxygen, you have the command "Toggle comment", bound to Ctrl+Shift+Comma per default.

danbalogh commented 9 months ago

I'm not entirely happy with that. We already have contextual rules for the bibliography, e.g. it seems that @n is mandatory in the primary bibliography but not in the secondary. If adding more contextual rules would be too much of a complication, then we should investigate other solutions, e.g. the introduction of a dummy bibliography pointer (which I think has been raised before in a different context, but I cannot recall what). The thing is, I don't like the idea of having to use a template that is in conflict with the schema right from the start. I also don't like the situation where I think nearly half of the "errors" now flagged on https://dharman.in/texts are instances of "element "bibl" incomplete", simply because the encoders who created those files didn't comment out parts of the now-erroneous template. And finally, what if in spite of all these misgivings today I comment out the secondary bibliography as a whole, and then after another improvement next week the schema starts raising an error because the secondary bibliography's presence is now mandatory?

michaelnmmeyer commented 9 months ago

In this case, it is best to allow <bib> to be empty. A lot of files are using a "John Doe" bibliography entry as a placeholder, by the way. See e.g. https://dharman.in/display/DHARMA_INSSiddham00101.

danbalogh commented 9 months ago

Thanks for spotting this. This must have been what my foggy memory of an earlier occurrence of the dummy bibliography pointer idea was about. Is it used anywhere outside the siddham corpus? Apparently, there I decided to refer to the Zotero ID AuthorYear_01 when the schema showed an error unless a reference was present.

So, @arlogriffiths and @michaelnmmeyer , shall we make this official, mention it in the guides, revise the inscription templates (and other templates as the case may be) accordingly, and make the change (automated as far as possible) in existing XML files?

The empty bibliography in the Siddham files looks like this:

<div type="bibliography">
  <p/>
  <listBibl type="primary">
    <bibl n="siglum"><ptr target="bib:AuthorYear_01"/></bibl>
  </listBibl>
  <listBibl type="secondary">
    <bibl n="siglum"><ptr target="bib:AuthorYear_01"/></bibl>
  </listBibl>
</div>

In the template, instructions could be added in comment to replace the empty p with the epigraphic lemma and replace the dummy bibl elements in the structured bibliographies with the actual citations relevant to the inscription.

Or, if we don't want to go this way, then the question still remains: should the schema allow empty <bibl> (if yes, then overall, or only in this specific context?) / or shall we comment out all empty bibliographies in existing XML files AND the template(s)?

michaelnmmeyer commented 9 months ago

The "John Doe" entries are used in various files, not only siddham.

I just allowed empty <bibl> in the schema. I will skip over them in the processing code, as well as over "John Doe" entries.

danbalogh commented 9 months ago

So to be clear, am I right that this means the following:

If so, this sounds good to me; let's see what @arlogriffiths says.

michaelnmmeyer commented 9 months ago

@danbalogh Yes, exactly.