Closed PonteIneptique closed 2 years ago
We discussed it. I advocate having a general purpose Margin
type for all this kind of things (physical, and not logical zones, to avoid over-typing):
https://github.com/SegmOnto/examples/blob/main/zones/Margin/Margin.md
I feel like this is very wide name, mostly because I have cases where some margins have very different meaning:
From my perspective, I have at least 2 different margins here, (B.I 14[...]
and the app crit). Under copyright law, the critical appartus might not be sharable, and being able to specifically target this one to ignore it in the output (or remove it from the scan ?) would be an important feature for people working with modern data.
I vote for not adding a specific zone, but establishing a subtyping procedure, why not with suggested values. @gabays and @ArianePinche ?
I think for handling easily copyrighted information (ie. the Critical Apparatus), having a CritApp as a known zone or subzone might be important...
Good subtype for margin
for instance. I think we can start gathering them and including them in the definitions ;)
I agree with JB: subtypes are a better idea. On top of semantic problem, it will help us gather data to train general models.
Should be settled by #23
Problem though: there is an partial overlap between the position (margin
/top
) and content (criticalAppartus
, gloss
, etc). To be consistent, I vote for No1, despite Thibault's good remarks
I don't understand what you vote for
Le jeu. 11 nov. 2021 à 4:16 PM, Simon Gabay @.***> a écrit :
Problem though: there is an partial overlap between the position (margin/top) and content (critical appartus or not). To be consistent, I vote for No1, despite Thibault's good remarks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966385168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZSKALKBZKRXCEUGZT3ULPM5JANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
No 1: peripheral:top
, peripheral:bottom
, peripheral:left
, peripheral:right
.
And not 2: peripheral:criticalApparatus
, peripheral:gloss
, etc.
Sorry for not being clear
I completely disagree, mostly because you already have non topological categories such as quiremarks and title. We need a critApp category for the purpose I gave
Le jeu. 11 nov. 2021 à 4:26 PM, Simon Gabay @.***> a écrit :
No 1: peripheral:top, peripheral:bottom, peripheral:left, peripheral:right . And not 2: peripheral:criticalApparatus, peripheral:gloss, etc. Sorry for not being clear
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966393251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZQFBKIRCVZFUT242IDULPOCTANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
peripheral:note
, peripheral:criticalApparatus
, peripheral:title
, etc. It is going to be a mega maxi mess. @Jean-Baptiste-Camps ?
(But I get your point. Maybe an intermediary solution?)
@gabays : You mean Margin
, right ?
@PonteIneptique : quire-marks
is topological (99% of the cases), and I offered to delete Title
.
What's wrong with Margin:criticalApparatus
or Custom:criticalApparatus
?
Also: I don't think we should care too much about consistency in (open suggested) subtype values, since we allow users to add any kind of subtype.
margin
bottom
footnote
critical apparatus
margin
, no matte their type. And criticalApparatus
is a kind of margin
, no?Same could apply to quiremarks and running title no? This seems to be an issue once you use the position on page to define the zone...
Le jeu. 11 nov. 2021 à 6:16 PM, Simon Gabay @.***> a écrit :
- margin
- bottom
- footnote
- critical apparatus
- … I agree not too care too much, but let's try here to keep homogeneous data under a standard name, so that we can train a model with all the margin, no matte their type. And criticalApparatus is a kind of margin, no?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-966477255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZTIJ625O6SINB6C2MDULP27VANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
A running title and a quireMark are really not the same thing, while footnotes and marginal notes are really similar. Also, you seem to mix the critical apparatus and the varia lectio. The critical apparatus includes footnotes etc. I think we should prefer something straightforward with minimal ambiguity.
Then
Note
Note is less ambiguous than margin (which refers more to the situation around the page) and better than peripheral. Just like other categories, it's clearly semantically based (title, main, quireMarks) and not "topological".
Nope, because a note is not a gloss (to me), and the appcrit is a footnote
The issue is that the current description of margin is way to wide and will always create this kind of doubts:
Margin: characterises any text zone contained in the margins (upper, lower, inner or outer), including the space between two columns, whatever their semantic status (gloss, additions, …).
This is completely defined on the bases of the position on the page. With this in mind, RunningTitle, QuireMarks and Numbering
all fit the description (it's in the margins of main). Margin does not say anything about gloss or whatever, it's purely defined as "not being main".
Up to you to see, but I really feel like Margin is a very bad zone, and mixing up non-semantic and semantic categories will create this kind of discussion in the future.
I think you are right in the sense that we are not precise enough, and that the options we offer are not satisfactory. We have to work on that.
There is a sense in distinguishing Margin
text from Quire-marks
, Numbering
and RunningTitle
in the sense that Margin
is text, and often text you are interested in when you OCR/HTR it, while the rest is not really a text.
But I agree that there is some dose of semantism that we are otherwise trying to mostly avoid, when it does not unambiguously match a physical data. There is usually really no much chance to confuse a III
in the bottom of the page with a gloss or an addition. Also, keep in mind that our focus here is handwritten material / old books, not modern editions.
This being said, it is probably not perfect.
Rename in MarginText
? But we do not have MainText
, so it is assumed to be implicit.
Hmm,
Also, keep in mind that our focus here is handwritten material / old books, not modern editions.
Please update the readme if it is the case, because it does not warn about this limited focus anywhere:
This repository contains examples, to help establish and illustrate an ontology of layout analysis and segmentation.
And define old books :) Is 19th old or not ? If so, critical apparatus can be found there.
And generally, it does not fix the problem inherent to the category Margin. I stand on the fact that Notes
would be better than Margin
in the context of all the other categories, because it clearly excludes QuireMarks, Numbering and RunningTitle.
Traditionally, "old books" is understood as specialists of book history as books produced until 1800. We should be more explicit for non specialist of philology usw.
I think we have a definition problem here. Margin
is different in the sense that it comments the main
, which is not the case of numbering
or runningTitle
, related to the level of the book and not the text. In that sens, note
makes sense. It is a note
on the main
text, or a note
on another note
in case of docs with multiple level of annotation/glosses (decretum gartiani, talmud-like docs…)
19th century is not old books indeed. But there is a good question about the focus. Originally, it was Western (Medieval) Manuscripts, but @gabays helped us reach out to Early Modern stuff. And perhaps there is no reason not to go a bit further.
Again: Margin
is a text (it is content), while Quire-marks
, Numbering
and RunningTitle
are not. We can call it MarginText
if necessary.
Margin: characterises any zone contained in the margins (upper, lower, inner or outer, including the space between two columns), providing textual content, whatever their semantic status (gloss, additions, critical apparatus, notes …) but with the exclusion of non-content zones (quire-marks, numbering, running title, etc.).
?
(ça commence à faire long)
Laurent also suggested to go further. To me we should embrace simple written western documents (to be reformulated).
I agree on the definition. There is still the question of subtypes. For me margin:sidenote
and margin:footnote
are enough. Footnote embraces both technical notes such as the varia lectio and fully written notes, otherwise what do we do if there are multi-layered annotations (varia lectio, historical notes…) like here. We cannot discriminate regarding the content, this is a bottomless pit otherwise.
I am still unconvinced by the name but the definition is better. As for the subtype, from what I understood, this is a customizable system So it's okay
Le ven. 12 nov. 2021 à 6:28 PM, Simon Gabay @.***> a écrit :
Laurent also suggested to go further. To me we should embrace simple written western documents (to be reformulated). I agree on the definition. There is still the question of subtypes. For me margin:sidenote and margin:footnote are enough. Footnote embraces both technical notes such as the varia lectio and fully written notes, otherwise what do we do if there are multi-layered annotations (varia lectio, historical notes…) like here https://en.wikipedia.org/wiki/File:Musonius_Rufus_Reliquiae_Hense_1905_page_1.jpg. We cannot discriminate regarding the content, this is a bottomless pit otherwise.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SegmOnto/examples/issues/19#issuecomment-967289104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOXEZS4D7MUFW4WXYOHWD3ULVFE5ANCNFSM45ITT4XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
The subtype is open, so the problem is solved in a way. To cope specifically with @PonteIneptique 's problem, we now offer :variants
as a subtype to annotate the varia lectio as a specific MarginZone. I will add a picture to make things clear
Hey ! I am not specifically part of the project but I was wondering what was your stance on print zone ? Identifying some zones are sometime extremely important when someone wants to retrieve texts from critical edition, eg. being able to identify the critical apparatus.
Would a zone
Critical Apparatus
make sense ?