Open gjvnq opened 1 year ago
I think this is a good idea. <gap>
is probably a simpler choice than <del>
, especially since you'd want to use it inside the deletion anyway if the text is illegible.
Thanks @gjvnq — I definitely agree that this should be clarified. On the GL page for gap
, it notes:
The gap tag simply signals the editors decision to omit or inability to transcribe a span of text. Other information, such as the interpretation that text was deliberately erased or covered, should be indicated using the relevant tags, such as del in the case of deliberate deletion.
So I'd be inclined to encode the example like so:
<del><gap reason="blackout" extent="multiple sentences"/></del>
or, if you wanted to flag the different types of deletion:
<del type="redaction"><gap reason="blackout" extent="multiple sentences"/></del>
But having some good examples of these kinds of redactions (and their relationship to ellipsis
) in the GL and clarifying practices seems like a good idea to me.
This del > gap
structure does seem like the best way forward without having to introduce a new tag.
Another question: should the tag gap include a text content made up of Unicode block elements (e.g. U+2588 █
)?
This has the advantage of making file conversions and text extraction easier but it might go against the current TEI guidelines.
Example:
<p>In this case, the defendant John Doe (real name: <del type="redaction"><gap>████████████████████</gap></del>) claimed that ....</p>
We could also use U+2592 ▒
for rendering illegible gaps.
I get the feeling that
<gap>
is the best element for redacted documents like this one however the documentation isn't very clear as<del>
feels like a good contender.I feel that the documention for
<gap>
should be updated to clarify it's the recomended way to encode redactions/censorship along with a proposed value for@reason
, perhapsblackout
,redaction
, orcensorship
. By clarifying the documention I meand adding something like the text in bold below:Alternatively I guess a new
<redacted>
or<censor>
element could be added. The advantage of such an approach is that once the full document is released, the text behind the blackouts can be just included inside the<censor>
tags which I think doesn't fit well with the<gap>
tag.