ThomHehl / Moffatt

The Moffat Bible
1 stars 1 forks source link

Use USFM as a stepping stone towards OSIS XML #1

Open DavidHaslam opened 6 years ago

DavidHaslam commented 6 years ago

USFM denotes Unified Standard Format Markers.

This markup format is simpler than XML to write.

For details see http://paratext.org/usfm

There are existing scripts available to convert USFM to OSIS.

One such script is the Python program called u2o.py developed and maintained by adyeths.

I expect the goal would be attained sooner if you were to use USFM as a stepping stone towards OSIS.

btw. Thanks for your email. @ThomHehl

DavidHaslam commented 6 years ago

To write and maintain a Bible translation in USFM format you could make use of Bibledit-Desktop.

DavidHaslam commented 6 years ago

Another useful editor to consider for transcribing a (pre-digital age) printed Bible is called EasyKey available from MissionAssist UK

This supports a small subset of the USFM tags, but more than enough for many such projects.

DavidHaslam commented 6 years ago

USFM requires a separate file for each Bible book.

Although OSIS XML can be used for single books, for practical uses, most applications would expect to see a single XML file for the complete Bible (or however many books were in the translation).

Conversion scripts such as the one that I referred to take care of that.

ThomHehl commented 6 years ago

I like the separate books in separate files because it makes maintenance easier for me and it fells good to have a whole file finished and move on to the next. Plus, it's been easier to hunt down when I miss pairing tags up.

Thanks for the excellent counsel. I will look into the things that you've mentioned after I finish the book I'm currently working on, 2 Kings.

Hopefully USFM will be of big help when i get to the poetic books as doing lines of poetry is seriously annoying.

DavidHaslam commented 6 years ago

It will definitely make it simpler to transcribe poetry passages, especially those with several indentation levels.

But even for prose passages, you should find the task more manageable.

btw. Are you using Windows or Linux or Mac OS ?

If you're a Windows user, I recommend these two Unicode text editors:

Notepad++ has a good XML Tools Plugin.

I made a USFM Language Definition file for Notepad++.

It's not perfect, due to the mismatch between how USFM works and how the syntax highlighting works.

Even so, it's still very useful.

DavidHaslam commented 6 years ago

One future goal might be to make a SWORD module for use with the various free Bible study apps available from CrossWire Bible Society and its friends.

I'd be glad to help when you reach that stage.

No need to worry about having separate XML file per book.

The utility called osis2mod has an append switch in the command line syntax.

ThomHehl commented 6 years ago

My plan has always been to have a SWORD module. That's what led me to OSIS to begin with.

ThomHehl commented 6 years ago

I use all three platforms, but Windows mostly for this project.

So, you've named several tools, which do you think I would be best to start with for my project?

DavidHaslam commented 6 years ago

I use Notepad++ as the Unicode editor of choice. It has a powerful search & replace feature

For deeper work involving non-ANSI characters, I find BabelPad very useful too.

Both editors are free. The former is open source.

BabelPad is proprietary, but I have a good working email relationship with the developer Andrew West.

For more complicated tasks, I'm a great fan of TextPipe from DataMystic. This is not free, but worth its weight in gold, so to speak. Been using it since 2001. It's not an editor, but a program that can be used for lots of things from data-mining to file format shifting.

DavidHaslam commented 6 years ago

While the BibleTechnologies website remains AWOL, you can point to the OSIS schema on the CrossWire server.

Otherwise you'd not be able to validate the OSIS XML files, unless you'd already grabbed a copy before the site disappeared.

See here

NB. I have a contact chasing to find out what happened to the website.

ThomHehl commented 6 years ago

Excellent. I've downloaded notepad++. I can't figure out how to add in the USFM plugin.

It would be great if instructions for that would be included in the readme file on github.

DavidHaslam commented 6 years ago

It would be great if instructions for that would be included in the readme file on github.

Menu | Language | Define your Language...

The popup form looks like this: screenshot 2017-11-27 09 35 14 I should've minimised my Windows Taskbar.

Click the Import... button. Browse to the required XML file you downloaded from my repo. Click "Open".

See also User Defined Language.

DavidHaslam commented 6 years ago

Don't forget to install the XML Tools plugin.

DavidHaslam commented 6 years ago

Another useful Windows tool is WinMerge.

DavidHaslam commented 6 years ago

Thinking ahead.

The Python script adyeths/u2o is normally used to create a single OSIS XML file for the whole Bible translation, taking all the USFM files as the input data.

For this reason, it would make sense to convert your existing XML files to USFM so that these may be readily included when the u2o script is used.

This would make it a simpler step on the route to making a SWORD module.

As I have the software tools to do the back conversion, I plan to fork the repo. After performing the task, I'll issue a pull request.

DavidHaslam commented 6 years ago

I've made significant progress in the above proposed plan.

Some preliminary observations are reported in #11