Copenhagen-Alliance / versification-specification

Versification mappings and versification snifffing
17 stars 6 forks source link

JSON representation of VRS needs to handle one-to-many mappings #15

Open jonathanrobie opened 3 years ago

jonathanrobie commented 3 years ago

Copied from https://github.com/bible-technology/scripture-burrito/issues/253

@mvahowe wrote:

Right now the mapping section of the "vrs-as-json" schema looks something like this:

        "mappedVerses": {
            "propertyNames": {
                "$ref": "#/definitions/bcvRange"
            },
            "additionalProperties": {
                "$ref": "#/definitions/bcvRange"
            }
        },

I don't think this handles one-to-many mappings, eg

PSA 51:0 = PSA 51:1
PSA 51:0 = PSA 51:2

In this case we have one property but need to represent two BCV ranges. This happens in standard versifications, and happens a lot more when mappings are reversed. Something like

PSA 51:0 = PSA 51:1-2

will create problems since existing VRS logic says that ranges have the same number of verses on each side.

I think that the cleanest way to address this is to make the properties an array of BCV ranges. (In the majority of cases there would be exactly one BCV range in each array.)

jonathanrobie commented 3 years ago

In Paratext and in our current mappings, this is done using the idiom you mentioned above:

PSA 51:0 = PSA 51:1
PSA 51:0 = PSA 51:2

At the very least this needs to be documented. Verse 0 is "anything before the first chapter number".

mvahowe commented 3 years ago

Verse 0 is "anything before the first chapter number".

Anything between the chapter number and the first verse number. But that's not the issue. The issue is that the JSON schema cannot represent those two lines. The key will be "PSA 51:0" so one line will overwrite the other. That's why I think the JSON schema needs to be modified to allow something like

{
  "PSA 51:0": ["PSA 51:1", "PSA 51:2"]
}
jonathanrobie commented 3 years ago

OK, we will have to decide how to handle singleton values, then. The old taste question - which is better for a singleton value:

{
  "PSA 51:1" : "PSA 51:2"
}

Advantage: less noisy, simpler representation of the common case.

Or

{
  "PSA 51:1": ["PSA 51:2"]
}

Advantage: you can use the same expression for singletons and sequences of other lengths

UnasZole commented 1 month ago

Hi,

This ticket is quite old so I don't know if any decision has been made on the topic since, but I'd like to contribute a practical thought : among the two proposals from @jonathanrobie , the second is certainly best. In such a schema, meant to be a standard and thus used by many applications in many different technical stacks, I'd certainly favour the consistency of the object types (ie always having an array), even if it's slightly heavier to write for users. You'll have much better support in any tools trying to read your schema. Think of Java applications that might use jsonschema2pojo to read your versification files in a practical object oriented manner : if you have a single consistent type for all properties, like array of strings, people will get a strongly typed setter. If you allow several types with a "oneOf" construct, people will get an untyped getter and have to resort to casting - or forgetting object mapping altogether and force people to read the json as a node tree. More generally, allowing multiple types for a property will make files a lot less convenient to parse - which is a significant issue if your aim is to define a standard which applications can easily integrate.

(For context, I should explain that I've recently been trying to convince the maintainers of libsword to improve the way versifications work in the tool, and in these discussions someone mentioned this Copenhagen Alliance standard - which I just started studying and may be interested in integrating support for in libsword/jsword if it matches well enough the use cases I have in mind)