The Text Encoding Initiative Guidelines
268 stars 88 forks source link

samplingDecl gloss seems too partial to corpora #2469

Open raffazizzi opened 9 months ago

raffazizzi commented 9 months ago

The current (v 4.6.0) gloss of <samplingDecl> says (emphasis mine):

contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.

A paper at TEIMEC23 suggested using <samplingDecl> to describe the completeness of the encoding of one document, the sampling being justified by analytical reasons: they were interested in only encoding parts of a document, while being explicit about what they were leaving out.

The gloss for <samplingDecl> doesn't seem to support this use of the element, but the guidelines are more permissive (emphasis mine):

The samplingDecl element may be used to describe, in prose, the rationale and methods used in selecting texts, or parts of text, for inclusion in the resource.

I think the gloss should be updated accordingly. For example:

"contains a prose description of the rationale and methods used in sampling parts of a text or in sampling texts in the creation of a corpus or collection."

lb42 commented 9 months ago

The description in the prose is much better also because it avoids the repetition of the word 'sampling'