Closed KarlG1965 closed 7 months ago
How many images are we talking here when you say 'a lot'?
In this particular instance, exactly 7336 images. All, apart from about 3, are png files about 3k in size. (It's a statistics book, hence the vast number of images of mathematical formulas)
Do all the images happen to be in one (or very few) xhtml file(s)? Does the epub behave sluggishly in Sigil before the attempt to restructure?
The epub is 48.7mb in size. There are a total of 29 xhtml files, 20 of which are chapters (which I assume will be where most of the images are linked).
If I open the epub it reacts normally. I can open different chapters and scroll through them no problem. If I restructure the HTML it takes about 10 seconds.
Saving the epub takes about 50secs - 1 minute on my system (just with reformatted HTML, not restructured)
I notice that just about all linked images have inline css
e.g.
\ \<img alt="$$\frac {1}{2}(yield{11}+yield{10})$$" src="../images/519209_1_En_12_Chapter/519209_1_En_12_Chapter_TeX_IEq294.png" style="width:8.68em"/>
Don't know if that's maybe causing issues as well.
The one minute save time is very excessive, but that's something entirely different.
That alt property string is pretty terrible. Are they actually trying to put a mathml formula in there? That's very ill advised.
I don't know WHAT they are trying to do, to be honest! :-) The book is from Springer, and in general they are VERY badly coded.
This is one of the joys of Sigil. I can open the book and remove all the unnecessary garbage!
Could your machine be running out of memory? How much memory does it have?
Linux never seems to handle running out of memory in a reasonable way, normally willy-nilly killing processes left and right? Is your machine set to use virtual memory/swap. Lots of distribution installers never bother to create swap partitions (or swap files) and never properly use swapon.
Will you please enable saving crash images and use gdb to get a backtrace from it (or alternatively run Sigil inside of gdb by tweaking the sigil launch script and generate a backtrace that way) and post it here?
There is also a Sigil plugin that will replace all the text of the epub with jibberish. You could try running it on a copy of that problem epub. Then saving the result.
And attach that jibberish version here by zipping it up and attaching it to this issue (or post a link to it for us to grab).
The plugin is called the Borkify Plugin and it is available in Sigil's Plugin Index on our Mobileread forum.
FWIW, even worse, that alt string is a attempt to use TeX/Ascii Math not actual mathml.
Does your book have any javascripts (.js) files or inline javascripts being used?
If so, restructuring it to Sigil norms is very ill-advised. Sigil can not update any links to resources inside a javascript and most javascripts are not designed to be relocated.
If so, that may be what is causing the crash. We may need to validate that no javascripts exist in the epub before allowing Restructure to Sigil norm to be run.
Could your machine be running out of memory? How much memory does it have?
Linux never seems to handle running out of memory in a reasonable way, normally willy-nilly killing processes left and right? Is your machine set to use virtual memory/swap. Lots of distribution installers never bother to create swap partitions (or swap files) and never properly use swapon.
My machine has 16GB of RAM. Not really something I considered, but I know what you mean about Linux and memory management.
The plugin is called the Borkify Plugin and it is available in Sigil's Plugin Index on our Mobileread forum.
Just tried to install it, but it's giving me an error about missing Python 3.4 / 2.7. I need to deal with a few other things at the moment but I'll get back to this and figure out where it's setting the Python version and change it to 3.11 (which I'm using now)
No need to change anything in that plugin. Those are minimum required Python versions. Python 3.11 is fine.
Instead have you installed the recommended python modules Sigil needs to run properly?
Try running the latest version of the testplugin that will check for all the required pieces being available.
Grab it from here:
https://github.com/Sigil-Ebook/Sigil/tree/master/docs
You want testplugin_v020.zip
I also have some self-contained Python AppImages that have everything a Sigil plugin might require already included. You're welcome to us one if it simplifies things. They're a bit large (since they're self contained), but they should work out of the box. Just download the one for the Qt version your Sigil uses (Qt5 or Qt6); put it somewhere safe; make sure it's executable; and use the Sigil plugin preferences dialog to select it as the Python interpreter to use for plugins.
https://github.com/dougmassay/appimage-sigil-python/releases/tag/2023.11.2-1
No need to change anything in that plugin. Those are minimum required Python versions. Python 3.11 is fine.
Instead have you installed the recommended python modules Sigil needs to run properly?
Try running the latest version of the testplugin that will check for all the required pieces being available.
Grab it from here:
https://github.com/Sigil-Ebook/Sigil/tree/master/docs
You want testplugin_v020.zip
It was even simpler than that! I didn't see the path variable at the top of the plugin window. The path was empty, which was why I was getting the 'no path to 3.4' error!
facepalm
Does your book have any javascripts (.js) files or inline javascripts being used?
If so, restructuring it to Sigil norms is very ill-advised. Sigil can not update any links to resources inside a javascript and most javascripts are not designed to be relocated.
If so, that may be what is causing the crash. We may need to validate that no javascripts exist in the epub before allowing Restructure to Sigil norm to be run.
No, no JavaScript
Looks like I'm even having issues uploading the split files. Any suggestions where I can upload this?
Unfortunately no. Most file sharing services are just excuses to place malware on your machine.
Are these 3 pieces all of it?
https://drive.proton.me /urls /279G4GQ4X8#dyOGojuypiF4
If you add these together, you can download from proton
Okay, I grabbed it form that link. I will test restructuring with it and get back to you.
Ok great, I'll remove the share then
Okay, my restructure is still running. It may eventually run out of memory on my machine too (I have 32 Gigs).
That said, this epub has 7330 png images, 4 jpeg images and 1 gif image which is absurd. They have images of each letter of the alphabet sometimes repeated in different folders which represent different chapters.
So my guess is this started out as each chapter representing its own book and no one bothered to remove redundant images, they just threw the thing together. That does not even consider all of the images that represent more than one character or symbol (ie. an equation).
I tried running a Reports Tool on it just to get some counts and it took forever to process and that uses up to 20 worker threads.
I have no idea what is taking all of the time so far so I will have to randomly interrupt it just to look at the most common backtraces.
And after each move we realunch python3lib code to clean up any issues with the OPF being rebuilt. As one point it actually starts recursing and that is what makes it run out of memory.
This one will be a doozy to fix and we will need some type of block "move" code that waits until all of the moves are done and then updates the opf. But that means that at some point the opf manifest will be incorrect (not match reality) and that will cause a big problem.
So not a crash, a recursive out of memory state caused by the extreme use of image files (over 7330) of them which caused the opf to need to be updated 7330 times, which in turn causes all of the xhtml files to be updated 7330 times in order to search for and update any links.
This is not something we can address easily.
So I recommend not using reformat to Sigil Standard on this particular epub until a more efficient kind of bulk update process can be figured out.
As a workaround, I tried to export all the images and then remove them with the idea of doing the restructure and then adding the images back, but sigil crashed on me whilst deleting all the images.
It certainly sounds as if it will be a lot of work, but I'm glad you know about the issue now.
Regarding saving after reformatting the html, I think you mentioned that that shouldn't take as long either.
Maybe a separate issue?
Crashed or again just ran out of memory. I am able to delete all 7330 images in one go.
Not sure, tbh. Just know it didn't work, then I gave up :-)
Perhaps I'll add my two cents. I downloaded the sample file and no crash occurred with me. But I did "Mend and Prettify" immediately after opening, and only then did I enable "Restructure Epub to Sigil Norm". Sigil stopped responding (No Responding message in the title bar), but I didn't bother it and after a while the job was done.
My system: Windows 10 Pro, 32GB of memory, i7-11700 processor.
If I wait long enough it finishes on mine too. The problem is that for 7330 images, each one being moved, we end up moving them one by one which means we edit the opf 7330 times and each time it launches a python instance to parse and check the entire opf, and then each time it must parse the opf again to change a single line in the manifest and restructure it and rinse and repeat.
So creating a bulk update to the opf makes more sense and should make it much more efficient.
Okay, I have been working all evening on a Bulk Resources Update for the OPF and I have something working that reduces the time for a Restructure To Sigil Norm on that book to be about 40 seconds or less.
I then checked the time to delete all selected images (all 7330 of them) and it again suffered from the delete one by one causing repeated updates to the OPF. Luckily a BulkRemove for resources already existed and is used to speed up merging lots of files. I was able to change Sigil to use it anytime more than 50 files are being deleted. It sped it up to about 30 seconds to select all 7330 images and then delete them (with an much lower memory footprint to boot).
I then checked the time to save the epub, and it does not take a minute on my machine at all. But I have all ssd drives and a fast machine. I do not think there is any viable way to copy almost 7500 files and over 48 meg (compressed) including compressing them all in any less time.
I will wait until things settle down and we are sure a follow-up 2.1.1 is not needed and then I will commit the changes to speed things up (and greatly reduce memory consumption) got both Restructure to Sigil Norm, and for Deleting thousands of files at a time.
It is an interesting test case to be sure.
Okay, I did some polishing this morning and timed a full restructure to sigil norm at 12 seconds. Then timed a full save-as and it took 6 seconds.
Similar improvement in deleting thousands of images at a time. I consider this one done.
I will keep this open to remind me to push this to master when we are sure this new release is complete.
Okay, I did some polishing this morning and timed a full restructure to sigil norm at 12 seconds. Then timed a full save-as and it took 6 seconds.
Similar improvement in deleting thousands of images at a time. I consider this one done.
I will keep this open to remind me to push this to master when we are sure this new release is complete.
12 seconds? SOO LONG?? :-p
That sounds fantastic. Really looking forward to trying this out.
Thanks Kevin :-)
Since Sigil 2.1.0 appears quite stable with no showstoppers, we are reopening the tree for the next release. I have just now pushed fixes for your issue to Sigil master.
So I am now closing this as fixed.
Thank you for your bug report.
Bug Description
When I try to restructure epubs using Sigil's built in function, if the epub has a lot of images Sigil either crashes or can take up to 30 - 40 minutes to complete
Platform (OS)
Linux
OS Version / Specifics
Xubuntu 23.10
What version of Sigil are you using?
current from github
Any backtraces or crash reports