bible-technology / scripture-burrito

Scripture Burrito Schema & Docs 🌯
http://docs.burrito.bible/
MIT License
21 stars 13 forks source link

Flavor Distinctions #51

Closed jag3773 closed 4 years ago

jag3773 commented 5 years ago

We have a schema that allows for the definition of flavors that are nested under a set of four possibly flavorTypes. What we don't have is a semantic method of determining when a new flavor is warranted and what the process might be for making that happen.

Discussion about this was started on https://github.com/bible-technology/scripture-burrito/issues/26#issuecomment-521613966 but should be moved here.

This may warrant a sub-committee.

mvahowe commented 5 years ago

There's a currently empty slot in the documentation for this. My position has always been that "let a thousand flowers bloom" is the simplest and maybe the only we can realistically do this. SB 0.1 supports x-flavors which is actually part of how I set up new flavors. This lets anyone experiment, show to others and generally refine an idea and start using it without anything becoming official, and without us doing any work. As x-flavors gain traction we can look at them, and we probably need a checklist of how that works. But I don't think it's remotely realistic for a committee of people who are all busy elsewhere to try to micromanage the early stages of standards development.

The word alignment example seems useful to me. Tim, Jonathan and I came up with an initial plan. I implemented it and asked for feedback. Randall asked for some changes, which I made. Then others at GBI looked at the result... so what we're handing over to CA certainly isn't the last word on word alignment, but it it's a bit more than a good idea on the back of a napkin.

jonathanrobie commented 5 years ago

But I don't think it's remotely realistic for a committee of people who are all busy elsewhere to try to micromanage the early stages of standards development.

I wonder if it is realistic for us to anticipate all the combinations of things that people might want to put in a SB and enumerate them as flavors. Is a flavor more or less the same thing as the application semantics of the data used by a given application? Or perhaps the application semantics of two or more applications that agree to understand the data the same way? If the latter, should our committee be in the business of defining or enumerating them?

Suppose there are three applications in our community. One uses a Bible story text and corresponding audio. Another uses a Scripture text and corresponding audio. Another lets you pick either, together with the corresponding audio or sign language video. How many flavors do we need for these applications? Who needs to agree on what in order to define them?

To me, the problem with the word alignment example is that there's more than one way to do word alignment in applications. Can an application use a different approach to word alignment? If so, what does it need to do? What level of interoperability do we expect here?

mvahowe commented 5 years ago

We can't and shouldn't try to anticipate all possible flavors - that was kinda my whole point. I would expect lively discussion about possible new flavors to continue for at least the next five years, but that shouldn't stop us from using some of the flavors now. Also, I don't think we should try to come up with single flavors that combine ever more types of data. (If we were starting from scratch I'd consider removing some of what is in scriptureText right now.)

Suppose there are three applications in our community. One uses a Bible story text and corresponding audio. Another uses a Scripture text and corresponding audio. Another lets you pick either, together with the corresponding audio or sign language video. How many flavors do we need for these applications?

Six, I think. That example doesn't seem at all challenging, and we already have many of those scenarios working out of DBL with our existing types. So, eg, YouVersion mashes up scripture audio and text - we don't need another type for "text plus audio". You could mash up audio and/or text with sign language as long as the audio and sign language had timing information (although it wouldn't be "the text for the sign language" because that's not a thing). For me, the whole point of SB's generic metadata model is that it becomes easy for an application to pick 'n' mix flavors, at least in terms of sending, receiving, cataloguing and storing.

Who needs to agree on what in order to define them?

No-one. We had to pick some representative examples, in addition to the existing DBL types, to kick things off and to show how the scheme works. After that, I think we wait for the community to come up with proposals, using the x- mechanism, and as proposals gain traction we spin up a working group to consult and to thrash out the details.

The word alignment example is just that right now - an example. Let's make sure that others working in this area see it, and invite their feedback. I'm not going to worry about problems that, right now, are hypothetical, and I think the best way to get concrete feedback is by providing a concrete proposal.

What I really don't get is what the alternative is. If we say "ok, we won't even try to work on word alignment", how exactly does that help to solve the problem of word alignment? If we try and fail, totally, how is anyone worse off?

jonathanrobie commented 5 years ago

Is word alignment the core problem this group is solving? I suspect we could meet our requirements without solving this problem. And I think we will face enough issues just solving our core requirements.

I prefer a minimal solution that is orthogonal to questions like how an application might do word alignment. Unless it becomes clear that there really is a single way to do word alignment that is interoperable among most applications that have this need.

RIght now, I think we're still working on scripture text ...

mvahowe commented 5 years ago

Generic metadata is the core problem we're solving right now. In order to be sure that the approach to metadata is generic, we need examples of a range of specific data. If we just think about text, we don't even know that, in the future, our approach could handle audio recordings of that text. I know this to be true because DBL ran the experiment the way you are suggesting, ie one application at the time, several years ago, and it didn't work out very well for us.

mvahowe commented 4 years ago

Are there any action points here beyond what we're already doing? If not, can we close this?

jag3773 commented 4 years ago

At a general level much of our approach here is described in https://docs.burrito.bible/en/latest/extending.html .