Bridgeconn / usfm-grammar

An elegant USFM parser.
https://usfmgrammar.vachanengine.org/
MIT License
36 stars 14 forks source link

Output JSON Schema #202

Closed kavitharaju closed 1 year ago

kavitharaju commented 1 year ago

This PR

What is pending in this task

cmahte commented 1 year ago

Looking at the various options, I don't see BCV-flat. The most useful "flat" option is going to be the way scripture is always referred to: Book Chapter:verse.

Now, in reality that means BICV (Book, Intro, Chapter, Verse) and also that means that some information in USFM is out of place: \s \r and all paragraph types belong to the \v that immediately follows them.

But for nesting that would be the most intuitive and useful.

Level 1 BookID (including \usfm \h \toc and \rem tags that occur before any \mt) Level 2 Intro (including \imt \mt \ip etc.) Level 3 Chapter (Including \c \ca \cp \cl \d) Level 4 Verse (including \p (etc) \r \s )

or

Level 1 BookID (including \usfm \h \toc and \rem tags that occur before any \mt) Level 2 Intro (including \imt \mt \ip etc.) Level 2 Chapter (Including \c \ca \cp \cl \d) Level 3 Verse (including \p (etc) \r \s )

in the second option intro chapters effectively become chapter zero. Whether that's required in JSON I don't think it is. but in OSIS and other languages, it is a convention to mark a chapter 0, and it helps the front end programs with displaying study materials in a better way than crammed above or into chapter 1 verse 1. And the second option is problematic with some aprocrypha books which have 'canonized' introductions before chapter 1, Sirach I believe has 14 verses. Separating the scripture from modern text becomes a problem if chapter zero is assumed non-canon, but there are scripture verses before the chapter 1 mark. If you have a separate level then both the apocryphal pre chapter 1 verses, and the "book" divisions in psalms are possible by marking the level 2 intro level then continuing with the chapter level.

kavitharaju commented 1 year ago

In the previous version of usfm-grammar(2.x) we followed a similar structure as what you suggest, following the intuition that a Book-Chapter-Verse structure is what is going to be used most.

Level 1 BookID (including \usfm \h \toc and \rem tags that occur before any \mt) Level 2 Intro (including \imt \mt \ip etc.) Level 3 Chapter (Including \c \ca \cp \cl \d) Level 4 Verse (including \p (etc) \r \s )

But in this version we are trying to keep our output structure as close to what is natural in USFM as possible, but still bring in the advantage of using more programmer friendly formats. The main difference from this, would be that, we don't put paragraphs(\p) under verse(\v), but verse under paragraph. Also section headings(\s) and related markers(\r) comes under chapter(\c). But rest of nesting is kept in the Nested JSON.

kavitharaju commented 1 year ago

Closing this as we will be moving to USJ