SegmOnto / Guidelines

14 stars 4 forks source link

Critical Apparatus #19

Closed PonteIneptique closed 2 years ago

PonteIneptique commented 3 years ago

Hey ! I am not specifically part of the project but I was wondering what was your stance on print zone ? Identifying some zones are sometime extremely important when someone wants to retrieve texts from critical edition, eg. being able to identify the critical apparatus.

Would a zone Critical Apparatus make sense ?

Jean-Baptiste-Camps commented 3 years ago

We discussed it. I advocate having a general purpose Margin type for all this kind of things (physical, and not logical zones, to avoid over-typing):

https://github.com/SegmOnto/examples/blob/main/zones/Margin/Margin.md

PonteIneptique commented 3 years ago

I feel like this is very wide name, mostly because I have cases where some margins have very different meaning:

image

From my perspective, I have at least 2 different margins here, (B.I 14[...] and the app crit). Under copyright law, the critical appartus might not be sharable, and being able to specifically target this one to ignore it in the output (or remove it from the scan ?) would be an important feature for people working with modern data.

Jean-Baptiste-Camps commented 2 years ago

I vote for not adding a specific zone, but establishing a subtyping procedure, why not with suggested values. @gabays and @ArianePinche ?

PonteIneptique commented 2 years ago

I think for handling easily copyrighted information (ie. the Critical Apparatus), having a CritApp as a known zone or subzone might be important...

Jean-Baptiste-Camps commented 2 years ago

Good subtype for margin for instance. I think we can start gathering them and including them in the definitions ;)

gabays commented 2 years ago

I agree with JB: subtypes are a better idea. On top of semantic problem, it will help us gather data to train general models.

Jean-Baptiste-Camps commented 2 years ago

Should be settled by #23

gabays commented 2 years ago

Problem though: there is an partial overlap between the position (margin/top) and content (criticalAppartus, gloss, etc). To be consistent, I vote for No1, despite Thibault's good remarks

PonteIneptique commented 2 years ago

I don't understand what you vote for

Le jeu. 11 nov. 2021 à 4:16 PM, Simon Gabay @.***> a écrit :

Problem though: there is an partial overlap between the position (margin/top) and content (critical appartus or not). To be consistent, I vote for No1, despite Thibault's good remarks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966385168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZSKALKBZKRXCEUGZT3ULPM5JANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gabays commented 2 years ago

No 1: peripheral:top, peripheral:bottom, peripheral:left, peripheral:right. And not 2: peripheral:criticalApparatus, peripheral:gloss, etc. Sorry for not being clear

PonteIneptique commented 2 years ago

I completely disagree, mostly because you already have non topological categories such as quiremarks and title. We need a critApp category for the purpose I gave

Le jeu. 11 nov. 2021 à 4:26 PM, Simon Gabay @.***> a écrit :

No 1: peripheral:top, peripheral:bottom, peripheral:left, peripheral:right . And not 2: peripheral:criticalApparatus, peripheral:gloss, etc. Sorry for not being clear

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966393251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZQFBKIRCVZFUT242IDULPOCTANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gabays commented 2 years ago

peripheral:note, peripheral:criticalApparatus, peripheral:title, etc. It is going to be a mega maxi mess. @Jean-Baptiste-Camps ? (But I get your point. Maybe an intermediary solution?)

Jean-Baptiste-Camps commented 2 years ago

@gabays : You mean Margin, right ?

@PonteIneptique : quire-marks is topological (99% of the cases), and I offered to delete Title.

What's wrong with Margin:criticalApparatus or Custom:criticalApparatus ?

Jean-Baptiste-Camps commented 2 years ago

Also: I don't think we should care too much about consistency in (open suggested) subtype values, since we allow users to add any kind of subtype.

gabays commented 2 years ago
PonteIneptique commented 2 years ago

Same could apply to quiremarks and running title no? This seems to be an issue once you use the position on page to define the zone...

Le jeu. 11 nov. 2021 à 6:16 PM, Simon Gabay @.***> a écrit :

  • margin
  • bottom
  • footnote
  • critical apparatus
  • … I agree not too care too much, but let's try here to keep homogeneous data under a standard name, so that we can train a model with all the margin, no matte their type. And criticalApparatus is a kind of margin, no?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966477255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZTIJ625O6SINB6C2MDULP27VANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gabays commented 2 years ago

A running title and a quireMark are really not the same thing, while footnotes and marginal notes are really similar. Also, you seem to mix the critical apparatus and the varia lectio. The critical apparatus includes footnotes etc. I think we should prefer something straightforward with minimal ambiguity.

PonteIneptique commented 2 years ago

Then

Note is less ambiguous than margin (which refers more to the situation around the page) and better than peripheral. Just like other categories, it's clearly semantically based (title, main, quireMarks) and not "topological".

gabays commented 2 years ago

Nope, because a note is not a gloss (to me), and the appcrit is a footnote

PonteIneptique commented 2 years ago

The issue is that the current description of margin is way to wide and will always create this kind of doubts:

Margin: characterises any text zone contained in the margins (upper, lower, inner or outer), including the space between two columns, whatever their semantic status (gloss, additions, …).

This is completely defined on the bases of the position on the page. With this in mind, RunningTitle, QuireMarks and Numbering all fit the description (it's in the margins of main). Margin does not say anything about gloss or whatever, it's purely defined as "not being main".

Up to you to see, but I really feel like Margin is a very bad zone, and mixing up non-semantic and semantic categories will create this kind of discussion in the future.

gabays commented 2 years ago

I think you are right in the sense that we are not precise enough, and that the options we offer are not satisfactory. We have to work on that.

Jean-Baptiste-Camps commented 2 years ago

There is a sense in distinguishing Margin text from Quire-marks, Numbering and RunningTitle in the sense that Margin is text, and often text you are interested in when you OCR/HTR it, while the rest is not really a text.

But I agree that there is some dose of semantism that we are otherwise trying to mostly avoid, when it does not unambiguously match a physical data. There is usually really no much chance to confuse a III in the bottom of the page with a gloss or an addition. Also, keep in mind that our focus here is handwritten material / old books, not modern editions.

This being said, it is probably not perfect.

Rename in MarginText ? But we do not have MainText, so it is assumed to be implicit.

PonteIneptique commented 2 years ago

Hmm,

Also, keep in mind that our focus here is handwritten material / old books, not modern editions.

Please update the readme if it is the case, because it does not warn about this limited focus anywhere:

This repository contains examples, to help establish and illustrate an ontology of layout analysis and segmentation.

And define old books :) Is 19th old or not ? If so, critical apparatus can be found there.

And generally, it does not fix the problem inherent to the category Margin. I stand on the fact that Notes would be better than Margin in the context of all the other categories, because it clearly excludes QuireMarks, Numbering and RunningTitle.

gabays commented 2 years ago

Traditionally, "old books" is understood as specialists of book history as books produced until 1800. We should be more explicit for non specialist of philology usw.

I think we have a definition problem here. Margin is different in the sense that it comments the main, which is not the case of numbering or runningTitle, related to the level of the book and not the text. In that sens, note makes sense. It is a note on the main text, or a note on another note in case of docs with multiple level of annotation/glosses (decretum gartiani, talmud-like docs…)

Jean-Baptiste-Camps commented 2 years ago

19th century is not old books indeed. But there is a good question about the focus. Originally, it was Western (Medieval) Manuscripts, but @gabays helped us reach out to Early Modern stuff. And perhaps there is no reason not to go a bit further.

Again: Margin is a text (it is content), while Quire-marks, Numbering and RunningTitle are not. We can call it MarginText if necessary.

Jean-Baptiste-Camps commented 2 years ago

Margin: characterises any zone contained in the margins (upper, lower, inner or outer, including the space between two columns), providing textual content, whatever their semantic status (gloss, additions, critical apparatus, notes …) but with the exclusion of non-content zones (quire-marks, numbering, running title, etc.).

?

Jean-Baptiste-Camps commented 2 years ago

(ça commence à faire long)

gabays commented 2 years ago

Laurent also suggested to go further. To me we should embrace simple written western documents (to be reformulated). I agree on the definition. There is still the question of subtypes. For me margin:sidenote and margin:footnote are enough. Footnote embraces both technical notes such as the varia lectio and fully written notes, otherwise what do we do if there are multi-layered annotations (varia lectio, historical notes…) like here. We cannot discriminate regarding the content, this is a bottomless pit otherwise.

PonteIneptique commented 2 years ago

I am still unconvinced by the name but the definition is better. As for the subtype, from what I understood, this is a customizable system So it's okay

Le ven. 12 nov. 2021 à 6:28 PM, Simon Gabay @.***> a écrit :

Laurent also suggested to go further. To me we should embrace simple written western documents (to be reformulated). I agree on the definition. There is still the question of subtypes. For me margin:sidenote and margin:footnote are enough. Footnote embraces both technical notes such as the varia lectio and fully written notes, otherwise what do we do if there are multi-layered annotations (varia lectio, historical notes…) like here https://en.wikipedia.org/wiki/File:Musonius_Rufus_Reliquiae_Hense_1905_page_1.jpg. We cannot discriminate regarding the content, this is a bottomless pit otherwise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-967289104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZS4D7MUFW4WXYOHWD3ULVFE5ANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gabays commented 2 years ago

The subtype is open, so the problem is solved in a way. To cope specifically with @PonteIneptique 's problem, we now offer :variants as a subtype to annotate the varia lectio as a specific MarginZone. I will add a picture to make things clear