Open stichiboi opened 2 years ago
Just to be transparent: this idea originates from an error I keep getting when reading some epubs
KeyError: "There is no item named 'styles/3.ttf' in the archive"
This error originates from the epub rather than from ebooklib
: opening the file with Atom shows that indeed there is no styles/3.ttf
(there is a fonts/3.ttf
).
I don't want to throw away the whole epub just because it cannot read the styles, so ideally I could just skip reading them
This should also make the process quicker.
But I'm no expert in EPUB, so maybe this is not a good idea 😓
Good point. Everything fails now if EPUB claims to have something which is really missing in the archive. One option would be for the EpubReader. Something like fail silently. The other one would be like you suggested - list of things to ignore/allow.
Hello I'm trying to read data from
epubs
I downloaded from the web. I'm just interested in the text, I don't care about images or styles Would it be possible to add amedia_type_filter
option and only load the specified types from the manifest?I imagine something along the lines of, in
epub.EpubReader._load_manifest
And the
media_type_filter
would just be a list I pass in as options