bible-technology / scripture-burrito

Scripture Burrito Schema & Docs 🌯
http://docs.burrito.bible/
MIT License
21 stars 13 forks source link

What goes in a scriptureText burrito? #52

Closed mvahowe closed 3 years ago

mvahowe commented 5 years ago

https://github.com/bible-technology/scripture-burrito/blob/mvh/non_text_scripture/docs/scripture_text_flavor.rst

is where things get real. I suggest that we have the USFM/USX discussion elsewhere. We can talk about the metadata, but I don't think it's very controversial. The fun bit is deciding what resources we are going to require and allow.

The "release" list is quite short. (I think DBL sometimes sees other files but it's complicated to get a list - did I ever mention that what Paratext puts into DBL right now is not documented anywhere to the best of my knowledge?)

The "source" list is based on unzipping archive.zip for CEVUK and removing the files that @FoolRunning said Paratext didn't need. (I'll post his email below for transparency.) I don't pretend to understand PT internals, but, from eyeballing this list,

We agreed a long time ago that SB was to be 'round trippable', eg

I've expressed concerns about the mechanics of this process on many occasions. We're now looking at the implementational details. AFAICS:

Or, to put it more succinctly, I don't think we're even close to being able to promise 100% roundtrippability.

What we can do is start with the easy stuff, and that would be better than nothing. So

Concretely, before SB 0.1 Beta, we need to remove the ??? in the page linked to above, either by defining and justifying the presence of the file or by removing it.

mvahowe commented 5 years ago

From an email by @FoolRunning, June 18th:

Mark,

These are the files in the source zip that Paratext definitely does not need:

.hg folder
Any files with the .DIC extension
Any files with the .TXT extension except for hyphenatedWords.txt
ldml.xml
ProjectProgress.tsv
unique.id
wordlist.wdl

Assuming we're not having DBL be an archival for Paratext projects,
these are files Paratext does not need for a resource text:

Anything that starts with Notes_
Anything that starts with PrintDraft
Canons.xml (Is this needed for DBL?)
CheckingStatus.xml
CommentTags.xml
hyphenatedWords.txt
license.json
ProjectProgress.xml
ProjectUserAccess.xml
mvahowe commented 5 years ago

(Things look a lot easier if, instead of roundtrippability between an arbitrary number of editors, we aim for content to be able to make a one-way trip into Paratext after initial editing elsewhere, and we we assume that Paratext wil be used for the bulk of consistency and other checks.)

jonathanrobie commented 5 years ago

I agree with TIm's list.

Canons.xml might be worth discussing, though.

I think most real implementations will have this kind of data, which is not interoperable among applications. I also think trying to actually store the .hg files would bloat burritos to the point that they would be harder to handle - better to identify the repository for anyone who has the right credentials, and better to avoid sending files that might provide access to the repository for people or applications that do not.

mvahowe commented 5 years ago

My understanding was that Canons.xml was a standard file. Does it get extended for custom canons? We have a mechanism for that in SB metadata.

We still need to decide how things are going to work in practice. So, eg, if

how does that work? What happens if, say, the hyphenation, terms and other non-portable files are out of date with respect to the USFM? How does PT behave in those circumstances?

mvahowe commented 4 years ago

@jonathanrobie @jag3773 @joelthe1 We're approaching the point where we'll need to commit to details on this level. Some of these files look a little arbitrary. That's an issue because

So maybe we should finally have a conversation about what really needs to be there and whether the formats make sense for inter-system exchanges?

FoolRunning commented 4 years ago

I was asked to produce a list of files that are needed in a Paratext resource for Paratext to be able to adequately represent the text, project settings, etc. to the user. The list was written above, but was written in a negative way (i.e. what files are currently included that don't need to be included), so I thought it best to create a positive list as well:

The following files aren't really required, but most users would appreciate them existing (helps with translation, I think):

EDIT: Crossed out files that we don't really need (after talking with someone about it).

mvahowe commented 4 years ago

Thanks @FoolRunning

For the sake of landing something within our lifetimes, I suggest we go with your first list for now.

I think we have the USFM/USX covered in other issues. (With the variant proposal I think we could return either USFM or USX for "Paratext Resources".)

I'll spin up new issues to review the style, versification and LDML files. I can't imagine that their presence is going to be controversial but we might want to tweak the format.

Settings seem to me to be a much more complex conversation. I'll make an issue for that too.

jag3773 commented 3 years ago

We should have a role defined for these:

App specific files:

jag3773 commented 3 years ago

This PR would also be included in the non-app specific role definitions: https://github.com/bible-technology/scripture-burrito/pull/158/files

jtauber commented 3 years ago

I've added a localedata role for the LDML file.

Do we then just need versification and the stylesheet roles to call this done?

I'm not sure I fully understand what is needed for "App specific files". Can't an app just put whatever files it likes in the ingredients and if they're app-specific, either use x- roles or no role at all (and assume the app knows what they are by filename) ?

jag3773 commented 3 years ago

Do we then just need versification and the stylesheet roles to call this done?

That sounds good to me @jtauber .

I can't make sense of my comment about "app specific files." Seems like we talked about it and that was the solution we came up with, but it's obviously not clear enough to implement! Maybe @jonathanrobie can make sense of it? If not, let's not worry about it for now.

jag3773 commented 3 years ago

Decided today that custom.sty should be an x- role. @jtauber Include a bit in the docs about that.

Also document a recommendation that app specific files show up in a sub-directory of ingredients. This is not globally enforced, nor required, but a recommendation.

jtauber commented 3 years ago

The remaining issues are now covered by #248 so closing this.