Copenhagen-Alliance / versification-specification

Versification mappings and versification snifffing
17 stars 6 forks source link

Dictionary keys and order #2

Closed jonathanrobie closed 4 years ago

jonathanrobie commented 4 years ago

We are using dictionary keys to enumerate books in maxVerses and other places. As the JSON spec tells use:

An object is an unordered set of name/value pairs.

Do we need a way to specify the order of book?

jonathanrobie commented 4 years ago

We closed this yesterday, saying the order of books is out of scope and would be specified elsewhere.

chris-morgan commented 1 year ago

If it’s out of scope here, I’m curious where it’s supposed to be in scope. I’m a newcomer to this space, and the one work I’ve looked at in any detail so far, engWEBBE from DBL, uses non-sequential book file names (GEN.usx, EXO.usx, &c.) and has no other relevant metadata, so that its versification.vrs is the only thing I can see that could possibly specify book order. Losing that property would make this rather unsuitable as a replacement.

(Aside: its versification.vrs doesn’t entirely match the work—its text contains PSA 151, but its versification only goes up to PSA 150 and PS2 1 and has no mapping around PSA 150 or PS2 1. If it doesn’t even necessarily match, I honestly can’t see any value in tracking the number of chapters and verses in books in versifications, for any kind of publication—only for tooling assistance during editing. The mapping is the part that has actual value.)

jonathanrobie commented 1 year ago

Book order is different in different canons. The same text may be used to print, say, a Protestant Bible, a Catholic Bible, an Orthodox Bible, each in different book orders. The same Old Testament books translated from a Hebrew Old Testament might be printed in either traditional Jewish order or traditional Protestant order. So there simply isn't one set of book orders, you have to look to canons for that. And that means that hard coding book order into the names of books only works if you will only ever use that translation in one canonical order. For most translations, that's probably true, but it doesn't work for all.

engWEBBE came from ebible.org - https://ebible.org/ - they are the right people to talk to about any errors in their versification.vrs.

jonathanrobie commented 1 year ago

I'm not sure if there is an issue for our versification spec to resolve here. Are you suggesting that we add book order? I would prefer to leave that up to the canon.

chris-morgan commented 1 year ago

Hmm… is that tied into the whole USX_1/USX_2/&c. thing? I found that rather confusing, given that nothing specified what each was in any way. I hadn’t reflected on the fact that book order might change. But I’d have though that if book order was going to change, maybe versification would too, e.g. if printing an English Bible in Jewish order, maybe it’d use the Hebrew chapter divisions instead of English. I’m ignorant in all this, but I had been starting from the assumption of each published work (whatever that means) having a specific versification and ordering.

But going back to what I said: the only thing I can imagine in the engWEBBE release from DBL that would specify book order is its versification.vrs, so I was just assuming that the order of entries in that file must be significant, since I see no other reasonable way for it to be working. It’s possible that there is instead something wonky with that particular resource; I haven’t looked at others (and don’t have access to the commercially-locked-up ones). If what you’re saying is how things are done, then neither versification.vrs nor file names could be how it was done, and I don’t see what else there is—certainly nothing in the case of DBL’s engWEBBE. How do people decide what order the books should be presented in, on DBL resources?

So I’m not sure if there’s something wonky about DBL’s engWEBBE (I don’t know what other DBL resources might do; but the USFM files you can download for WEBBE from ebible.org directly don’t include a .vrs file and do use sorted file names), or whether software that interacts with it imposes some other hard-coded order on it (which would perplex me, I don’t see how that could be reasonable), or something else (and I have no hint at all as to what it could be).

I do know that SWORD’s versifications include book order. (e.g. its KJVA, RSVA and Catholic versifications have some overlap in their included apocryphal books, but in different orders.)

So yeah, I was largely just assuming that the order of books in the versification file was significant because (a) it matches what I’d expect, and (b) I can’t see what else might be conveying the information.

jonathanrobie commented 1 year ago

If you are printing the same text in both English and Jewish order, odds are that you are using Hebrew chapter numbers, which are close to the Protestant chapter numbers in general. If you are printing both an Orthodox and a Protestant translation from the same source, odds are good that the Protestant version will include something called an "Apocrypha" and the Orthodox version will put it all in one place - after deciding whether or not to put that text in separate books as a Protestant would.

This is a real requirement we have been asked to meet by people who do this kind of thing.

I can't really speak specifically to eBible's engWEBBE, the version on DBL was uploaded by eBible, so I suggest you talk to them. Personally, I would want to have an accurate versification file for any Bible I am working with. But if they do not give you one, that's out of scope for this repository, which is all about versification files. Of course ... you could try the Python versification sniffer to see if you can create a versification file automatically, that's what it is there for. It's in this repository.

There are various ways we could add support for book orders. Often, the main place this comes up in software is in the dialog used to open up a book, and software tends to use the same order in those dialogs, even if the Bibles they use might be printed in a different order. The Software might, for instance, take a modern Protestant view of book order, even when looking at the Septuagint. But a printer would normally print the Septuagint in traditional Greek order. And that same printer probably has to think carefully when preparing an Ethiopian Orthodox Bible for print.

I could imagine adding support for book order. But we would have to think all of this through carefully. I suspect adding a section that supports one or more book orders could make sense.

jonathanrobie commented 1 year ago

So yeah, I was largely just assuming that the order of books in the versification file was significant because (a) it matches what I’d expect, and (b) I can’t see what else might be conveying the information.

In many languages, the order of dictionary keys is not significant. If we want to specify order in a language-independent way, we can't do it this way.

FWIW, Python changed the semantics in 3.7 to say that order is significant. But earlier versions of Python and other languages won't respect this order. So we can't rely on that.

RobH123 commented 1 year ago

That's no longer correct. From https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict

Changed in version 3.7: Dictionary order is guaranteed to be
insertion order.

Robert.

On 18/05/23 21:24, Jonathan Robie wrote:

In a Python dictionary, the order of keys is not significant. If we want to specify order, we can't do it this way.

chris-morgan commented 1 year ago

I know that JSON objects should not be treated as order-significant. What I was suggesting was simply that the current schema had lost something that I thought must have been significant about the .vrs format.

On reflection and review, though, in files like org.vrs and eng.vrs, deuterocanonical books appear after the New Testament books, so their order is clearly not indicating desired book order, and I was barking up the wrong tree the whole time, and I’m left not knowing how book order is supposed to be signalled.

You speak of “leaving it up to the canon”: where is this canon, how does a work identify what canon it uses? (And in the case of DBL resources, is this the USX_1/USX_2/&c. directories? Are those hard-coded values or something else?)

I’d presumed that “canons” and “versifications” were essentially merged concepts, like they are in SWORD.

(I shouldn’t have brought up the defects of engWEBBE’s .vrs file. It was meant to be an aside uncommented-on.)

jonathanrobie commented 1 year ago

That's no longer correct. From https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict Changed in version 3.7: Dictionary order is guaranteed to be insertion order.

True.

But the semantics of JSON are defined by JSON, which has to work in other languages as well.