Roznoshchik / Lurnby

A tool for active reading and personal knowledge management
https://www.lurnby.com
BSD 3-Clause "New" or "Revised" License
673 stars 17 forks source link

Finding Epub Images #9

Closed Roznoshchik closed 1 year ago

Roznoshchik commented 2 years ago

Epubs seem to have very limited consistency with how they organize their internal file structure.

I haven't figured out a great way of finding the image folder.

images = soup.find_all('img')
        if images:
            for img in images:
                img["loading"] = "lazy" 
                filename = img['src']   
                filename = filename.replace("../", path+"/")

                if not os.path.exists(filename):
                    filename = f"{path}/{img['src']}"

                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/media/{img['src']}"

                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/images/{img['src']}"

                if not os.path.exists(filename):
                    filename = img['src']
                    filename = filename.replace("../", path+"/OEBPS/")

Whenever I encounter an epub whose images don't load, I need to load up the epub, look at the folder structure and then manually add in the branching path.

I'm sure there's a better way to search the epub to locate the image folder itself which would work for any yet undiscovered filepaths.