STEPBible / BibleEngine

General purpose library for powering JavaScript Bible projects
44 stars 9 forks source link

Bible parsing error #209

Open seraphx2 opened 1 year ago

seraphx2 commented 1 year ago

The Bible file I am using is from here https://ebible.org/Scriptures/details.php?id=eng-kjv2006 The file I am using is the Crosswire Sword module entry Here is my app. It's really small:

import {
  BeDatabaseCreator,
  V11nImporter,
  SwordImporter,
  OsisImporter,
} from "@bible-engine/importers";

const args = process.argv;
const importer = args[2];
const dataFile = args[3];

const creator = new BeDatabaseCreator({
  type: "mysql",
  host: "127.0.0.1",
  port: 3306,
  username: "bibleengine",
  password: "<password>",
  database: "bibleengine",
  dropSchema: true,
});

creator.addImporter(V11nImporter);

if (importer === "osis")
  creator.addImporter(OsisImporter, {
    sourcePath: `D:/bible-importer/osis/${dataFile}`,
  });

if (importer === "sword")
  creator.addImporter(SwordImporter, {
    sourcePath: `D:/bible-importer/sword/${dataFile}`,
    skip: {
      crossRefs: false,
      notes: true,
      strongs: false,
    },
    logLevel: "verbose",
  });

creator.createDatabase();

I am getting this error when trying to run an import on a sword file:

running importer: Versification Rules
ignored 1769 unsupported or invalid rules from source types: English+Greek,Greek2,Latin,Greek3,English+Latin2,Greek,GreekIntegrated,GreekUndivided,Hebrew+Latin,English,Latin2,English+Latin,Latin=,Latin+Bulgarian,Latin+Greek,English +Latin,Bulgarian (thereof 388 rules for non ap books from source types: Greek2,Latin,Greek) - set DEBUG=true to see details
running importer: SwordImporter
running importer: OSIS
version:  # Sword module configuration fil
SwordImporter failed OsisParseError: text outside of paragraph: "In the " in Gen 1:1 # Sword module configuration fil

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)  
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

OsisParseError: text outside of paragraph: "In the " in Gen 1:1 # Sword module configuration fil

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)

Node.js v18.16.1
danbenn commented 1 year ago

Hi @seraphx2 , sorry for the late reply. You'll need to use the plaintext flag, we use this for translations like KJV which often don't have paragraphs or other page-level formatting. Let us know how it goes!

seraphx2 commented 1 year ago

Hi @seraphx2 , sorry for the late reply. You'll need to use the plaintext flag, we use this for translations like KJV which often don't have paragraphs or other page-level formatting. Let us know how it goes!

Sorry if this should be obvious, but I'm not seeing where to specify that flag.

seraphx2 commented 1 year ago

I added versionMeta to the SwordImporter config and still getting the same error: (though I'm not sure if that is even setup correctly or how to know, as far as the values)

creator.addImporter(SwordImporter, {
  versionMeta: {
    uid: "ENGKJV",
    title: "King James Version 2006",
    isPlaintext: true,
    hasStrongs: true,
  },
  sourcePath: `D:/bible-importer/sword/${dataFile}`,
  skip: {
    crossRefs: false,
    notes: true,
    strongs: false,
  },
  logLevel: "verbose",
});
ignored 1769 unsupported or invalid rules from source types: English+Greek,Greek2,Latin,Greek3,English+Latin2,Greek,GreekIntegrated,GreekUndivided,Hebrew+Latin,English,Latin2,English+Latin,Latin=,Latin+Bulgarian,Latin+Greek,English +Latin,Bulgarian (thereof 388 rules for non ap books from source types: Greek2,Latin,Greek) - set DEBUG=true to see details
running importer: SwordImporter
version:  ENGKJV
running importer: OSIS
version:  ENGKJV
SwordImporter failed OsisParseError: text outside of paragraph: "In the " in Gen 1:1 ENGKJV

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

OsisParseError: text outside of paragraph: "In the " in Gen 1:1 ENGKJV

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)

Node.js v18.16.1
seraphx2 commented 1 year ago

@danbenn any ideas on what I'm doing wrong?

seraphx2 commented 1 year ago

Hello? Anyone?

danbenn commented 1 year ago

Hi @seraphx2 , apologies for the late reply, can you try adding this flag to your code?

autoGenMissingParagraphs: true,

So with your current setup, that would be:

creator.addImporter(SwordImporter, {
  versionMeta: {
    uid: "ENGKJV",
    title: "King James Version 2006",
    isPlaintext: true,
    hasStrongs: true,
  },
  sourcePath: `D:/bible-importer/sword/${dataFile}`,
  skip: {
    crossRefs: false,
    notes: true,
    strongs: false,
  },
  logLevel: "verbose",
  autoGenMissingParagraphs: true,
});

This is what I'm seeing in the source code that causes this error, bible/osis/index.ts:

        if (!stackHasParagraph(context, currentContainer)) {
            if (this.context.hasParagraphsInSourceText && !context.autoGenMissingParagraphs) {
                throw new OsisParseError(`text outside of paragraph: "${text}"`, context);
            }
            if (!this.context.hasParagraphsInSourceText || context.autoGenMissingParagraphs) {
                currentContainer = startNewParagraph(context);
            }
        }

If the autoGenMissingParagraphs flag isn't specified, it will assume that the source file is corrupted. In this case, to the best of my knowledge, KJV genuinely doesn't have paragraphs, so we want to insert them.

Let us know how it goes!

seraphx2 commented 1 year ago

lol. Now this error:

SwordImporter failed OsisParseError: unclean container stack while closing "translationChange" group. Found "paragraph" in Josh 15:1 KJV

container stack:
  root
    translationChange

    at validateGroup (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\functions\validators.functions.js:8:15)
chriswep commented 11 months ago

for whomever might come across this in the future: a lot of bible source files out there (especially OSIS, which sword is based on) are of very poor quality, containing lots of structural errors. Bible renderers often work around those issues however our importer needs to be more strict to translate the source file into a well defined format. I had to manually correct most of the OSIS files i came across in the wild. That's why the error message contains is very specific about the type of error and the location in the source file. However editing the source doesn't work with the Sword format. So you could either use a sword to osis converter and and import osis directly or in the case of bible.org you can also download the USFM format which usually is well defined. I successful used this usfm to osis convert multiple times to import sources from bible.org without issues: https://github.com/adyeths/u2o (download usfm, convert to osis, use osis importer to import into bible engine)