Sefaria / Sefaria-Export

Structured Jewish texts and metadata exported from Sefaria's database.
245 stars 161 forks source link

Any plans to include BDB in the export? #45

Closed julianwagle closed 7 months ago

julianwagle commented 9 months ago

I noticed you guys have what seems to be the most complete rendering of the BDB on the internet (in many respects, better than the copy-written one Biblesoft has licensed to many popular websites.) After necromancing around the depths of the interwebs, I found a thread where your CPO (@EliezerIsrael) pointed to bdb_parse being the fruit of a team at UTexas digitizing the public domain BDB printing.

Their project failed to get a grant renewal before the rough edges were polished. Specifically, many foreign languages (Aramaic, Arabic, Ethiopian, etc.) were left in an encoded state. The Hebrew itself seemed to have an issue with ordering and encoding het and tsade.

After just glancing through your final results, I can see you seem to have fixed all of the above issues while seeming to have removed details encoded into the original (i.e., the original language of a given word.) I can tell a lot of work occurred between the final results of your public repository Sefaria-BDB and the content displayed on the site.

I also noticed that these final results were excluded from the large dump at Sefaria-Export. Given how much effort went into its production, I am unsure if this is intentional. That being said, I figured I'd give it a shot and politely request that it be included in the export, if possible.

Thanks a ton,

Julian Wagle