Closed mromanello closed 1 year ago
Ciao @mromanello !
Many thanks for this first exploration.
What does the workflow look like for importing them?
Section or chapter are ordinary text containers, they just need their own RawObject
and CanonicalObject
, which will be children to TextContainer
. It's a two-line-of-code story.
As to the creation, I'd go for a manual process (with ~15 commentaries we can do afford that)
Definitely !
Where do we store this data (before injection and after injection)?
Before injection: base_dir/comm_id/olr/sections.json
?
After injection: Simply in the canonical.json
, as a textcontainer.
What sanity checks to do before "accepting" this ToC? E.g. values in start and end must correspond to real page IDs?
Do we need validation for such a small, manually annotated dataset ? If necessary I would just assert comm.id + '_' + section['start']' in [p.id for p in comm.children.pages]
How to access the ToC in the Python API?
As a normal textcontainer (commentary.children.sections
)
This being said I would go for a more generic ontology of section types, like introduction
, commentary
... We will be happy to access all our commentary sections by the same name (and not comm.children.commentarius
for Lobeck but comm.children.kommentar
for Wecklein). I would hence go for :
[ # Commentary is not necessary as it is going to be the name of the file
{
"section_type": "index",
"section_title": "Index II. Scriptorum", # Optional, some of them are not named !
"start": "0519", # "page" doesn't seem to be necessary in my opinion
"end": "0520"
},
...
]
Suggested ontology:
[
"preface",
"introduction",
"hypothesis",
"text", # Possibly more fine-grained (translation, primary)
"commentary",
"index", # Possibly more fine-grainded (locorum, siglorum...)
"Appendix",
...
]
Hi @sven-nm
I'm doing a few more ToCs just to see whether the above schema + ontology works consistently.
I propose to make section_type
a list of strings instead of a string. This would allow us to properly label sections of commentaries that belong to multiple types in our taxonomy. For example, the section Ajax in De Romilly's commentary, should have both text
and commentary
, since text and commentary are on the same page.
What do you think?
I've done the following ToCs:
DeRomilly1976/sections.json
Kamerbeek1953/sections.json
bsb10234118/sections.json
Ferrari1974/sections.json
Hi @sven-nm
So this is what I created the other day for Lobeck's commentary:
Related questions:
start
andend
must correspond to real page IDs?