Closed MagosCawl closed 4 years ago
Ok atleast its a smaller novel this time. Ill get to it tomorrow morning
This is really odd. The download_manager itself seems to download the whole novel correctly but when it is run through the app, it fails for some reason. This might take a while and I'm really busy atm, but I'll keep you posted :D
No problem, I'm very grateful for you fixing it when you have time.
Hi again, sorry for the spam, I just wanted to quickly update you on my comparison of .epub files of same novel between 0.9.4 and again in 0.9.6. I know this isn't necessarily the right thread for it, but didn't know where else to put it...the issue appears to be fixed, but may be good to be aware of nonetheless.
I found that for the 20 novels I compared from boxnovel.com, all of them had an increased file size in the version downloaded via 0.9.6 compared to 0.9.4. File sizes increased between a few kbs to a couple of megabytes, depending on the novel.
A short novel for comparison is "Husband, Be A Gentleman" which in 0.9.4. has a file size of 290 kb, while the 0.9.6 version has a file size of 330 kb.
I loaded them up in Calibre and ran a comparison and found something interesting. You fixing reading of HTML tags added the individual chapters as links and included a whole lot of additional html code. However, crucially, it also added chapters whose contents were previously missing.
In the below screenshot is the comparison of chapter 14 of said story(chapter_19.xhtml since they split up a couple of chapters). Left is old, smaller version, and right is new, larger version. As you can see, downloading using 0.9.6 added the contents of chapter 14. So, the old version could not read chapter 14 at all previously. I verified this by manually checking the place of the chapter in the old epub, which was empty. A few other chapters had the same problem (chapter 18 part 2, twenty-five part 2 for example)
However, in 0.9.4 I did not run into any error message and the download had been indicated as being successfully completed. It could perhaps be an idea, if it's not implemented already, to run some kind of check to see that the part contains more text than "This novel was downloaded using NovelScraper. Support us or report issues by joining our discord: https://discord.gg/Wya4Dst"
Thank you so much for looking into this. I had no idea there was such a huge bug. I have been meaning to get rid of download_manager and do the downloading directly through the app, but I guess Ive delayed it long enough. Download_manager is written in python and the app is using javascript. Javascript is way better at handling web scraping so I will probably start working on that at this point, but it will take a while depending on how much free time I have. Oh and I also wanted to ask how BoxNovel has these exclusive novels that even other pirate sites dont have. Do they license them or?
Oh and your proposed suggestion is also easy to implement and I would probably add it when I manage to fix the current bug.
Thanks! Sadly, no, I do not know how Boxnovel acquires them for sure, but I suspect they scrape from webnovel.com. Not sure where other sites get their content from either.
Update: I've realized that the way I've coded the app isn't scalable at all and is bound to have a lot of bugs so I'm in the process of rebuilding the whole project on a new framework. Luckily I can still use most of the code so it shouldn't be too hard. The good news is that once I'm done, everything will become native; meaning at least 40% speed boost and no more clunky download_manager and the overall size of the app should decrease by at least 10-20%. Fingers crossed this will be worth it in the end.
Thanks for the update! The 40% speed boost in particular sounds amazing. I'll be happy to do additional testing when you think it's ready.
Hey, I hope you are doing well in this crisis. I'm back with some good news; the UI for the app is done. Novelplanet and the library are almost fully implemented as well. If everything goes great, it should be done in about a week.
Thanks for the update and I look forward to giving the new version a spin. Yeah, I'm doing good (not lost my job yet, wohoo!). Hope you're doing good too. Can't wait for life to return to normal, whenever that is.
P.S. Always forget to say this, but well met, fellow Umaru-chan fan.
I wasn't as lucky with my job doe ;( Finally landed a good job, but now it's gone poof, but what can I say, their loss for losing a god coder xd.
P.S. I've never seen the anime, but this is one of the coolest pictures I've found on the internet.
Hey, the app is done and works amazingly well. The problem now is that when I compile it into a .exe file, it stops working for some reason and my brain hurts.
I don't know how long this will take to fix or if it's even fixable, but I've made a simple guide on how to build this app yourself from the source code. As far as I have tested it, it does everything exceptionally well and should download the novels you were having a problem with. https://github.com/dr-nyt/NovelScraper/blob/Version-1.0.0/README.md
I completely agree - it works amazingly well and compiling it was a breeze!
I was able to download the 2 books I initially flagged as failing, plus I compared the contents again in Calibre. It appears 0.9.6 had another bug so some content wasn't included, but this version has everything.
For example, here a file size comparisons between .epub files for 0.9.6 and 1.0.0
Husband, be a gentleman - 331 kb -> 362 kb Golden time - 767 kb -> 806 kb Dragoon - 410 kb -> 1202 kb*
*For Dragoon, 0.9.6 downloaded the first few paragraphs, while 1.0.0 added everything.
I did a couple of spot checks vs the online text and everything appears to be there.
I'll do additional tests over the weekend, but it seems to work as intended and I'm incredibly grateful for all of your hard effort. Thank you so much!
Also, another file is generated with the epub file called chapters.json
. This file has an array of all the chapters downloaded so if you have a novel downloaded with 4000 chapters, and later it got 1 more new chapter, on your next download the 4000 chapters will be loaded instantly from the chapters.json
file and only the new chapter will be downloaded. My point is don't delete that file ;3
Your library file is probably somewhere here: C:/User/MyUsername/Downloads/NovelScraper-Library/library.json
, don't delete that either, and you can move it to the same location in another computer and it should work normally. And deleting the app itself or moving it around would be fine because the library is stored in the downloads folder and wouldn't; be affected.
And the main reason all of this got better is because previously I was using python to download content from a browser, but now I'm using a browser to download from a browser ;3
Oh and feel free to run multiple downloads at the same time. And let me know if there are other formats that I can work with like .pdf etc. I'm thinking if .epub file generator keeps giving me problems, I'll move to something else.
That's neat! I've tested it out for a couple of days now and its smooth sailing! It's very convenient that the chapter text is stored in the .json files as well. Potentially, you could offer more formats at a later time and even do batch conversion jobs into a certain format. .epub is the standard format for mobile devices so it works well across different layouts but there are those that would like to read novels as .pdf (no idea why - clunky and awful format imo) or as .mobi.
Couple of suggested for the UI
Since the issue is resolved I'm closing the ticket. Thank you so much for going through so much effort! I'll be sure to check out the .exe installer when it's ready.
Could you open a new issue as a feature request for the UI requests? Just so I dont end up forgetting this.
Please complete the following information):
Describe the bug Upon downloading the two titles "Bringing the Nation's Husband Home" and "Would you mind if I play?" from Boxnovel.com, the download fails partway through. It's interesting because I've previously downloaded them successfully, in version 0.9..4 (I was messing around with a fresh install to compare .epub files across versions).
"Would you mind I play?" appears to fail at part 28 (novel is 100 chapters, albeit not sure how that translates into percentages here). Console output: Would You Mind If I Play? Status: 26% mainController.js:429 Would You Mind If I Play? Status: 27% libraryController.js:171 Saving Library mainController.js:317 null libraryController.js:189 Library Saved! mainController.js:370 1
To Reproduce Steps to reproduce the behavior:
Expected behavior For the download to complete
Screenshots