DrewThomasson / VoxNovel

VoxNovel: generate audiobooks giving each character a different voice actor.
MIT License
121 stars 16 forks source link

Support staged processing and/or continuing in GUI mode #26

Open EdouardVuillard opened 1 month ago

EdouardVuillard commented 1 month ago

At the moment, any interruption means starting all over again. The program does not seem to recover any of its work if it is interrupted. A related concern is that the program does not permit user control of its stages. The program works in a number of (it seems) self-contained stages. At the least there appear to me to be the following: (1) Existing eBook is chosen by user. (2) eBook is converted to plain text. (3) eBook is separated into two CSV files, one containing dialog, one containing everything else. (4) Each segment from the tables in the CSV files is separately converted to one or more temporary WAV files. (5) The temporary WAV files are combined for each segment combined into a single WAV file for that segment leading to a growing final collection of WAV files. (6) The final collection of WAV files is combined together into an audiobook file. On a long book, every stage takes hours. Stage (5) is the longest but having waited hours for (1) to (4) to complete it is shame to have to do them again if the program has already completed them for a book once. I have run the GUI far enough to get part way through stage (5). My problem is that the GUI estimates it is going to take 152 days to complete. Some serious tuning of the settings appears necessary. For example, during installation (my second attempt at installation) I skipped the NVidia installation. I would like to go back and install that component but if I do, I will lose all progress to date. Also, given that for me the purpose is to produce an audiobook that I can listen to privately in my car, it would be nice to be able to produce an audiobook per chapter so that I can get started listening while the program keeps working through the remaining separate chapters. I appreciate that I probably could have divided up the base eBook myself.

DrewThomasson commented 1 month ago

Before I get started heads up:

-The loading bar estimates are always wonky and inaccurate at the beginning

I can promise you if it didn't take 152 days to generate to get to the final 5th combining stage then it will not take that long to get through the 5th stage of combining all the audio files into one.

Edit- The combining audio files should take the least amount of time

DrewThomasson commented 1 month ago

Thank you so much for analyzing my code, I've never seen anyone do this before I'm very impressed at how in depth you have gonna into the inner workings of my program

1) you probably don't need the nvidia driver thing, considering the audiobook is generating in hours rather than days shows that your CUDA cores are being utilized

So 🤷

2) you're right that I should implement a save file function of some kind to keep track of where it is, and be able to get back from where you left off.

3) I probably won't be implementing a "generate by chapter request" Sadly, that might add unnecessary complications to the program, that I a single repo maintainer would have trouble maintaining. :/ sorry

I'd LOVE to have anyone else helping out with the code for this project but so far nothing yet :/

Anyway, Thank you for your feedback and I hope that helps answer any of your questions.

EdouardVuillard commented 1 month ago

Thanks Drew and I certainly appreciate the limits on your resources. You are right about the time estimate: after a few days running it has dropped back to a much more reasonable 40 days. I understand that stage (5) is by far the longest stage and stage (6) I expect will be short. However, it would still be nice to be able to resume anytime if stage (5) is interrupted.

I have a possible temporary workaround for the problem of starting listening before the complete book is ready. I copied out the finished WAV files and merged them together using ffmpeg. This produced a single 40 minute WAV that I can load on to my iPod Classic. Not pretty but probably good enough to get started.

DrewThomasson commented 1 month ago

Yeah honestly that's what I do when I'm impatient with it lol

Also I now realize your running it off a cpu and not a nvidia gpu is why it's probably taking so long for you. . :/

If you have a nvidia gpu with a minimum of 4gb vram then you should be able to generate the audiobook at around real-time audiobook speed

Edit: I'll see about getting some kind of save file working tho

DrewThomasson commented 1 month ago

At the moment I'm designing a save system, any suggestions would be helpful.