Digital-Humanities-Quarterly / dhq-journal

DHQ is an open-access, peer-reviewed journal of digital humanities.
http://www.digitalhumanities.org/dhq/
10 stars 5 forks source link

Improve `@target` error messages and fix absoulte URI target checking #84

Open sydb opened 6 days ago

sydb commented 6 days ago

Tweak the rules for the @target of <ptr> or <ref> so that

  1. the (bad) value of @target is output in the message, and
  2. the tests both normalize whitespace in the value before comparison.

Also (with @brgrey) uncomment his test that targets start with one of ‘#’, “http://”, or “https://”, and fix it so that it does not prevent next rule from firing. (By using a predicate that means this rule only fires on those elements that the following rule does not fire on.)

These are pretty minor changes. Any 1 positive review should be sufficient, IMHO.

Note to reviewers — Make sure to select “Hide whitespace” under the settings gear near the top of the changed files diff.

One thing I wanted to do, BTW, before making this pull request, was to quickly search through the documentation to see if this message was quoted, and fix it if it was. But I do not know where to look for such documentation.

amclark42 commented 6 days ago

Is it always the case that these references should be to <bibl>s, and not other elements? It might be better to check if the local pointer matches any @xml:id within the document, rather than limiting the check to IDs on <bibl>.

I ran a counting robot report on the DHQ articles for $dhq-articles//text//*[@xml:id]/local-name() and found that these elements have IDs:

27013   bibl
1732    figure
412 div
199 table
72  example
52  note
15  formula
13  quote
5   anchor
3   item
2   head
2   seg

I think DHQ editors should be able to link to things like figures, tables, and examples (and I don't see the harm in linking to other things).

amclark42 commented 6 days ago

Slightly more complex counting robot report, asking what gets referenced by <ptr> and <ref>:

35739   bibl
1474    figure
365 not found
336 div
189 table
65  example
15  quote
10  note
3   anchor
2   formula

Counting robot query:

for $reference in ($dhq-articles//ref[starts-with(@target, '#')], $dhq-articles//ptr[starts-with(@target, '#')])
let $idReferenced := replace($reference/@target, '^#', '')
let $referenced := $reference/root()//*[@xml:id eq $idReferenced]
return
  if ( empty($referenced) ) then "not found"
  else $referenced/local-name()
amclark42 commented 6 days ago

Sorry! I hadn't noticed that <ref> is tested against every @xml:id in the document. It's only <ptr> that gets tested against only <bibl>s.

I'm in favor of the fix as long as we're certain that <ptr> should only reference <bibl>. (So far it does.)

sydb commented 6 days ago

I am a bit confused, @amclark42, because by my count ptr/@target does not point to anything in the same file dozens of times. @brgrey and I plan to meet on Fri 05 Jul to work on this.

amclark42 commented 5 days ago

@sydb That's true. I meant that, when the referenced element does exist, it is 100%, no doubt a <bibl>. It would not be a stretch to imagine that the problem cases you describe intended to reference <bibl>s as well.