bible-technology / scripture-burrito

Scripture Burrito Schema & Docs 🌯
http://docs.burrito.bible/
MIT License
21 stars 13 forks source link

Minor docs alignment issue between identification and idServer sections #66

Closed klassenjm closed 4 years ago

klassenjm commented 4 years ago

Minor issue: In the XML sample for identifications, a systemID is shown with a prefix paratext::. The idServer section shows and example with different prefix for what is clearly Paratext <idServer prefix="ptx">. Perhaps these could be aligned.

rdb commented 4 years ago

FWIW, I seem to recall that it was decided that the prefix is a completely arbitrary choice and not of semantic significance. So this is not necessarily an inconsistency as the spec does not mandate the choice of either "ptx" or "paratext".

mvahowe commented 4 years ago

@jonathanrobie @rdb In the light of recent conversations, I think PTX needs to decide what it "looks like" from the outside in a Scripture Burrito world. In other words, do we have Registry burritos that are in some way distinct from S/R burritos? Depending on the answer, we need one or two idServers in the metadata (however those are implemented by PTX).

Then we can decide on how the(se) idServer(s) relate to existing field(s) in DBL and PTX, which becomes a crucial issue when we do DBL migration in Q3 2021.

rdb commented 4 years ago

Three observations relevant to that:

1) The Registry currently accepts a PT GUID in almost all places where it accepts a Registry ID, so the Registry ID is not currently necessary at all for anything the Registry does.

2) The 1:1 association between Registry and PT projects is not set in stone. For what people usually consider a "translation project", there may in practice be several PT projects created (such as an OT and an NT project, and/or a back translation, etc.), and I could imagine that the Registry's opinion on what is a "project identity" could change in the future to reflect this reality.

3) There have been proposals to use the Registry for non-PT projects, such as sign language projects. I can't predict whether this goes anywhere, but I can conceive of a situation in a hypothetical future in which someone might want to produce Burrito that is associated with a Registry project, but does not come from Paratext.

It may be better to thus not bother with Registry identifiers for now, given that they add nothing at the moment, and reconsider this if any non-Paratext burritos associated with Registry projects start to appear.

À propos, I do wonder what SB thinks should be reflecting an "identity". Clearly, a Paratext GUID doesn't identify a "translation project", but just a folder of files on disk synchronized between computers. What happens if different systems have different opinions on what an identifier represents? Can one-to-many mappings occur?

mvahowe commented 4 years ago

I do wonder what SB thinks should be reflecting an "identity"

It views identity in terms of ID Servers. Yes, the meaning of an ID server is somewhat underspecified right now. But I think it's fair to say that, at the very least, SB expects an ID Server to keep track of snapshots of entries. In other words, the ID server should be able to find a SB (or, at the very least, the metadata for a SB), indexed by entry/revision, for any SB it claims to "own".

So I think that, one way or another, PTX is going to need to do that (or delegate doing that to some 3rd party). How PTX does that is a matter for PTX. Since SB doesn't currrently have hardwired ID server identifiers, this has no implications for the SB spec. But DBL will need to know the answer when it starts migrating legacy entries.

FoolRunning commented 4 years ago

But I think it's fair to say that, at the very least, SB expects an ID Server to keep track of snapshots of entries. In other words, the ID server should be able to find a SB (or, at the very least, the metadata for a SB), indexed by entry/revision, for any SB it claims to "own".

If that is true, then this might be a non-starter for Paratext (and one of the reasons I originally said that SB has to work completely offline). Paratext can create unique IDs for SBs it creates, but I don't see how we could make it so that anyone could ask for a SB in the Paratext ecosystem and expect to get it.

rdb commented 4 years ago

What exactly does "finding an SB" mean? Is it sufficient if someone can use the Paratext GUID to fetch an S/R revision (and somehow obtain the corresponding metadata) and produce a burrito from that? Or does each idServer effectively has to become a "burrito store"?

I think this might be an unnecessarily limiting design constraint to impose upon ID servers.

mvahowe commented 4 years ago

Is it sufficient if someone can use the Paratext GUID to fetch an S/R revision (and somehow obtain the corresponding metadata) and produce a burrito from that? Or does each idServer effectively has to become a "burrito store"?

I think this might be an unnecessarily limiting design constraint to impose upon ID servers.

I'm open to suggestions and I don't pretend to have thought through all the options and implications. But I think the same idserver/entry/revision needs to refer to the same SB every time, whoever and whenever someone refers to it. Giving every SB a new GUID every time it is produced would technically tick that box, but I'm not sure how useful it is to end users.

We started this with, essentially, how to port information we already have in DBL that claims to describe something in Paratext? Is any of that information useful or not?

FoolRunning commented 4 years ago

After our discussion yesterday, I'd say that only "published" SBs should be retrievable. I imagine most (all?) of the non-DBL SBs that Paratext produces will not be "published" (i.e. they will be used to move information between applications and then thrown away).

We started this with, essentially, how to port information we already have in DBL that claims to describe something in Paratext? Is any of that information useful or not?

I'm not sure I understand. Almost everything in DBL/SB describes something in Paratext so it is useful information.

mvahowe commented 4 years ago

It's only useful if it's a viable key for something in the hands of someone. I'm certain I have been told that some of what we have been storing is not useful and, indeed, in some cases, the Registry has been inventing data that DBL thinks PT produces when it no longer does. So I think it's reasonable to review the specific information we're including in PT systemIds.

mvahowe commented 4 years ago

I'd still like to understand PT's plan for ids within SB, but since SB doesn't specify vendor-specific id formats I don't think there's anything for us to decide here.