Closed thehale closed 2 years ago
Note: ePubs are basically a collection of html documents. Extracting their text could technically be done without any epub dependency.
UPDATE: This has been accomplished in the latest passing commits on this PR. More details are in the commit messages for any who are interested.
@deanmalmgren Just following up to make sure you saw this PR...
Thanks for this PR @jhale1805
EbookLib carries an AGPL license which makes it incompatible with textract's MIT license.
This commit replaces EbookLib with a BSD-3 licensed library that parses ebook contents just as easily for us.
Note: ePubs are basically a collection of html documents. Extracting their text could technically be done without any special epub dependency.
Fixes deanmalmgren/textract#409