bible-technology / scripture-burrito

Scripture Burrito Schema & Docs 🌯
http://docs.burrito.bible/
MIT License
21 stars 13 forks source link

Give Further Consideration to Word Alignment Flavor #149

Closed jag3773 closed 2 years ago

jag3773 commented 4 years ago

In part related to #75

mvahowe commented 4 years ago

Having thought about this, I feel like we have followed our own process for this ie

The only issue here seems to be that Paratext may solve this problem differently. But the working group has zero specifics about this. I don't think that we should set policy on the basis of rumours. So I think the correct way forward would be

mvahowe commented 4 years ago

The other consideration here is that we picked this flavor to illustrate the parascriptural flavorType, because people were struggling with what that flavorType was for. If we drop this flavor, we're going to be back to arguments about ontology with no examples.

jag3773 commented 4 years ago

The JSON that we are using for interchange is described in https://github.com/unfoldingWord/wordMAP-usfm/blob/master/README.md. AutographaMT exports in this format and the wordMAP module has a CLI to convert between USFM3 alignment and JSON alignment format. Internally, translationCore uses something similar (maybe identical, see an example at https://git.door43.org/jag3773/en_ult_zec_book/raw/branch/master/.apps/translationCore/alignmentData/zec/1.json).

joelthe1 commented 4 years ago

This spec was born out of a need to collaborate on a scripture text alignment project from English/Greek/Hebrew to gateway languages in India. This led UnfoldingWord and Friends of Agape/BCS (including communication with GBI) to develop this interchange JSON spec which works well for most of the cases therein. Given that this is something that has been used previously, has working code built against and that there is not an alternative presently (also the reason to make one) I vote for having this part of the next SB release even if it means that this may change completely in the future (for which there is no evidence that I know of).

jonathanrobie commented 4 years ago

I assume this is the current proposal we are evaluating?

https://github.com/bible-technology/scripture-burrito/blob/630e47bc3e1f470999823680097cc2e4f9940cee/docs/flavors/parascriptural_word_alignment_flavor.rst

If so, I think this is close. I would like to understand how word alignment relates to texts in burritos. I'm guessing the following things are true - if so, it would be good to state them in the text.

Under those assumptions, I think this works, and it will be helpful. I think these assumptions need to be stated in the text. I think it needs to clearly show how the alignment would point to specific texts of specific versions.

I think the documentation needs a little more. A simple, worked example would be helpful. Just showing a piece of each aligned text and the alignment for that portion would be sufficient, with clarity about how they are packaged in burritos.

The above text is not clear about what the datatypes and valid values are. This version is much clearer: https://github.com/unfoldingWord/wordMAP-usfm/blob/master/README.md. That level of clarity would be helpful.

Modulo these comments, I am in favor.

On Thu, Mar 19, 2020 at 12:21 PM Joel Mathew notifications@github.com wrote:

This spec was born out of a need to collaborate on a scripture text alignment project from English/Greek/Hebrew to gateway languages in India. This led UnfoldingWord and Friends of Agape/BCS (including communication with GBI) to develop this interchange JSON spec which works well for most of the cases therein. Given that this is something that has been used previously, has working code built against and that there is not an alternative presently (also the reason to make one) I vote for having this part of the next SB release even if it means that this may change completely in the future (for which there is no evidence that I know of).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bible-technology/scripture-burrito/issues/149#issuecomment-601273850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPMHZRIUYJ7RW4QLDNTRIJA65ANCNFSM4K46SKYQ .

jonathanrobie commented 4 years ago

I think it would be helpful to have an example of each of these kinds of mappings:

These mappings can be of the types: one-one, one-many, many-one, many-many, none-one/many or one/many-none.

There are examples in the wordMap document. I assume that non-sequential mappings like this would also be allowed?

{
  "score": 0.5363691430931895,
  "r0": [1, 3],
  "r1": [2, 7],
  "verified": false
},

On Thu, Mar 19, 2020 at 2:15 PM Jonathan Robie jonathan.robie@gmail.com wrote:

Also, I think it would be helpful to have an example of each of these kinds of mappings:

On Thu, Mar 19, 2020 at 2:08 PM Jonathan Robie jonathan.robie@gmail.com wrote:

I assume this is the current proposal we are evaluating?

https://github.com/bible-technology/scripture-burrito/blob/630e47bc3e1f470999823680097cc2e4f9940cee/docs/flavors/parascriptural_word_alignment_flavor.rst

If so, I think this is close. I would like to understand how word alignment relates to texts in burritos. I'm guessing the following things are true - if so, it would be good to state them in the text.

  • The aligned texts each live in their own burrito and the alignment lives in a third burrito.
  • The alignment will always point to specific versions of the texts, and is invalid for any other versions of the same texts.

Under those assumptions, I think this works, and it will be helpful. I think these assumptions need to be stated in the text. I think it needs to clearly show how the alignment would point to specific texts of specific versions.

I think the documentation needs a little more. A simple, worked example would be helpful. Just showing a piece of each aligned text and the alignment for that portion would be sufficient, with clarity about how they are packaged in burritos.

The above text is not clear about what the datatypes and valid values are. This version is much clearer: https://github.com/unfoldingWord/wordMAP-usfm/blob/master/README.md. That level of clarity would be helpful.

Modulo these comments, I am in favor.

On Thu, Mar 19, 2020 at 12:21 PM Joel Mathew notifications@github.com wrote:

This spec was born out of a need to collaborate on a scripture text alignment project from English/Greek/Hebrew to gateway languages in India. This led UnfoldingWord and Friends of Agape/BCS (including communication with GBI) to develop this interchange JSON spec which works well for most of the cases therein. Given that this is something that has been used previously, has working code built against and that there is not an alternative presently (also the reason to make one) I vote for having this part of the next SB release even if it means that this may change completely in the future (for which there is no evidence that I know of).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bible-technology/scripture-burrito/issues/149#issuecomment-601273850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPMHZRIUYJ7RW4QLDNTRIJA65ANCNFSM4K46SKYQ .

jonathanrobie commented 4 years ago

invite Paratext, again, to engage with the specific proposal or, at the very least, give us information - any information! - about their current system

Paratext currently does alignment dynamically and is not exchanging alignment data with other programs. It's clear that there are applications that want to exchange alignment data along these lines, and they agree on a way to do this. As long as we can specify this well, I think we should go ahead with it.

I have now listed some things that I think should be clarified / added to the proposal to make it clear. I'm guessing that they may be what was intended in the first place, but I am not certain. If I guessed correctly between the lines, then adding the missing information would address my concerns, and I would be happy to use the proposal if we do have this need. If I guessed incorrectly, then I would like to understand how it works.

jag3773 commented 4 years ago

@jonathanrobie Yes, you have guessed correctly and I am favor of making such modifications, which relate as much to SB implementation of this as anything. I think we have to have #75 in place to further specify what you are requesting.

joelthe1 commented 4 years ago

I assume that non-sequential mappings like this would also be allowed? { "score": 0.5363691430931895, "r0": [1, 3], "r1": [2, 7], "verified": false },

@jonathanrobie Yes, that is allowed. Few things on the roadmap are listed here: https://github.com/unfoldingWord/wordMAP-usfm/blob/master/README.md#roadmap

Also your understanding concurs with mine.

jonathanrobie commented 4 years ago

@jonathanrobie Yes, you have guessed correctly and I am favor of making such modifications, which relate as much to SB implementation of this as anything. I think we have to have #75 in place to further specify what you are requesting.

I think we need to at least be able to declaratively identify a version of a resource. Does that have to be a link? In Web terms, I think we need something more like a URN than a URL.

While Paratext does not currently need to consume this, it might be useful for us to be able to produce it for other systems. And I like the design.

mvahowe commented 4 years ago

@joelthe1 There's an example metadata document, imported from the SB 0.1 JSON, in PR #173. It's missing the scope, which I think should be "books of the NT", and there may be other things to tidy up.

Also, it would be good to tighten the schema - right now any string will do in most places.

Finally, we have the version of the aligner, but maybe we need the name of the alignment processor to make this more generic?

jag3773 commented 3 years ago

@jonathanrobie will check with Copenhagen Alliance to see if this JSON format is acceptable.

jtauber commented 3 years ago

@jonathanrobie is there anything I can do to help with this?

jtauber commented 3 years ago

(as a side-point, we're working on word alignment display in Scaife right now so it would be good if we could display the results of whatever CA / others decide on in Scaife.

jag3773 commented 2 years ago

For 1.0.0 release we want to convert this to a x-wordAlignment flavor. This gives an opportunity to show what a not-yet-official flavor looks like and it doesn't force the issue of whether or not this is an universal standard.

jag3773 commented 2 years ago

Make sure to update the example burrito.