BibleCorps / ENG-B1-Anderson1864-pd-USFM

Henry T. Anderson's 1864 "Civil War" New Testament
https://archive.org/details/thenewtestament00andeuoft
4 stars 2 forks source link

USFM filenames #7

Closed DavidHaslam closed 5 years ago

DavidHaslam commented 7 years ago

Suggest rename the USFM files to include the three letter book abbreviations as per the USFM User Reference.

cmahte commented 7 years ago

I'm all for supporting the 3 letter abbreviations (TLAs) in the filename, although the spec the TLA is for the first text row inside the file, not the name (see *USFM spec discussion below.) I have ~100 projects open and this renaming is cumbersome. If you care to act on the Anderson work please own it. I will act on all suggestions here, but they will be delayed, probably by years.

I've built auto renamers in the past (searches inside the file and names appropriately.) However, that technology was done for hire, and when I separated from that non-profit, one of the things I affirmed by signature is that work done directly FOR my employer is not mine, and I could not use that work without permission.

I've updated the Anderson text with all the outstanding items I can remember.

My focus right now is to complete the 1911 Bible (in my own account for now).

There is an another, updated version of the Anderson work. This repo is for the 1864 edition published during his lifetime. (the link at the top of the repo is the actual source used and a visual reference PDF.) If your paper reference is published or reflects the work published in 1917 by his family, with edits after his death, please branch or start a new repository to reflect the difference.

=== USFM Spec Discussion

The latest USFM spec is 3.0, although I still use the USFM 2.4 manual and spec. I have no plans to support embedded tagging in the 3.0 spec, nor the updated 'fig' tag. Otherwise I've implemented the new tags in the 3.0 spec and you might see them appearing (especially the \sd tag.)

The link you sent is still showing 2.4. Because the transition to Paratext 8 is a major change, USFM 3 is also transitional and may take 2-3 years to be fully implemented. That is, Bible projects nearing print (already completed initial translation and moved to back-translation and review), are encouraged to continue with Paratext 7, and NOT upgrade to Paratext 8. Therefore both USFM 2.4 and USFM 3 are "current," until Paratext 7 is deprecated. This isn't scheduled.

USFM 3 can be referenced at:

http://ubsicap.github.io/usfm/

I prefer the paper PDF, which is why I still use 2.4 for reference. The 3.0 spec has no PDF download link. :-(


In the 2.4 manual (with similar wording in the 3.0): if you read section 2.1: the 3 letter abbreviations are spec'd for the slot immediately after the \id tag inside the file. The spec doesn't mention the use of the 3 letters in the filename, although it's nearly universally used by convention.

The "number" in that table IS spec'd for the filename, but includes the words 'may also include' making it a recommendation instead of a spec. The ID number in the spec is very commonly ignored, especially where alternate versification is part of the work.

I've written an organizational recommendation for file naming, but I haven't published nor implemented it. I'll try to publish it soon.

Whenever I do tagging, I concatenate the text into a big file, and apply the initial tags with search and replaces to all books at the same time. This makes for a much more efficient and uniformly applied tags. I then split the books with the unix command csplit. This results in filenames with numbers present, but not the 3 letter codes. The projects on github all follow this method.

Since csplit starts at 00 and increments by 1, that's the officially supported numbering right now. Since the big file is always ordered in published order, the numbers may vary from the spec.

I started using github to track changes, and it appears that filename changes lose the history. If filenames do need to be changed, they should be done as seldom as possible.

For now, the projects should follow this guidelines. I've started a Guidelines Repo and a wiki within that for specification links, clarifications, Best methods, recommendations, etc. However, I don't speak wiki.. so it needs lots of love.

Project names EEE-BX-TTT1111-LLL-SSS2222 Top level Folder names EEE[BX]TTT1111[LLL]SSS2222

Where:

EEE - Language in 3 letter Ethnologue code (ISO 639.2\T). 639.2T takes priority and only if no entry is present, then go to ISO639.3. BX - Type of Scripture

I also plan to have many more works present, including multiple languages. I've looked at uploading some spanish and Albanian works, but I haven't done so because I haven't figured out how to group projects on GiHhub. I also don't see a 'sort' button. Any ideas?

DavidHaslam commented 7 years ago

Thanks for response and the USFM discussion. I have the CHM file for USFM 2.40 as a handy reference in my Windows PC. I'm already aware of USFM 3.0 though few of my software tools support it.

My facsimile reprint paper copy is of Anderson's 1864 text "stereotyped and printed for the author" in 1866 by John P Morton & Co,, Louisville, KY.

Though the reprint page size is 154mm x 234mm, the original was evidently a pocket edition with print area 67mm x 110mm, so there's a huge margin on every page.

I obtained it a few years ago, having seen that wonderful howler in Acts 16 while reading the SWORD module. I've done virtually nothing with the hard copy since I bought it. Anderson is very low on my priorities list.

btw. In my final year at Trinity College, Cambridge in 1968/69, I shared a room with an Old Etonian. :)

Aside: I've used ad hoc methods for systematic file renaming over the years. In Windows, I would typically enter dir /b >..\dir.txt then copy the text file into Excel, wherein I can use a formula to generate individual rename commands and then paste all the lines into a new Windows CMD file.

I've been a TextPipe user since 2001, so it's worth noting that TextPipe can readily be used to change filenames using Perl search and replace patterns within a Restrict to filename filter.

btw. TextPipe can also split an input file into several output files, similar to your description of csplit.

Finally, I should think that I'm less of an expert in GitHub than yourself.

DavidHaslam commented 5 years ago

I have recently forked the repository and am in the process of making significant changes in my Editing branch.

NB. I have not renamed the SFM files, but I have moved them to a USFM directory.

DavidHaslam commented 5 years ago

Will not fix.

Merged pull request #9