Flameish / Novel-Grabber

Novel-Grabber can download novels from pretty much any webnovel and lightnovel site.
MIT License
480 stars 62 forks source link

About & TOC After Cover #9

Closed RuthlessRuler closed 5 years ago

RuthlessRuler commented 5 years ago

It would be a nice if you could add An About Section & Table Of Contents that will be there after cover Image in the EPUB file.

In the end of the book, there could be a credit section where the Author & translator names can be added with source of DDL of the EPUB file.

Also characters Like "&" get missed out of name when Downloading Novels like Tales of Demons & Gods

Flameish commented 5 years ago

The cover and table of contents page is already implemented and will be in the next release which should be done tomorrow/next few days. The invalid character naming is also fixed. (Thanks for pointing it out, I forgot about it!)

I'm not too sure about adding an "About" segment, at least having it done automatically. There are many sites which do not have clean and easy to scrap novel descriptions which would make this really messy. On the other hand I don't quite see the point of having a credit page, with a link to the novel site on top. At least not for my own personal copy. I try to set the author and translator both as the epub "author" through.

RuthlessRuler commented 5 years ago

Okay.. Having only TOC is also good. Also, You can look more into re structuring the ebook along with additional options Option that would help the user customize the EPUBS like HR tag should be included or not and ETC.

Anyways, Thanks for such a great piece of software!

Flameish commented 5 years ago

I hope I understood you correctly with the HR tag thing you mentioned. Novel tags like "Action, Adventure, Sci-Fi" etc. from Wuxiaworld/RoyalRoad (and other sites which list them) are now added to EPUBs.

RuthlessRuler commented 5 years ago

IN the file called Toc.ncx in the EPUB, If possible dd the location of coverPages.html too. And by HR i meant the hr tag on the last of each chapter that is being placed.

Flameish commented 5 years ago

I can add the coverPage to the toc.ncx, that's no problem. But I probably will remove the Table of Contents page from the .ncx since its right after the cover. It's a bit cleaner that way.

You want the option to have a


at the end of each chapter? Sure I can add that as well.

RuthlessRuler commented 5 years ago

Its already there. I want the option to remove it. That's all.

Flameish commented 5 years ago

Now I'm confused. There is no hr tag/line at the end of the chapters, at least none which I added. Can you give me some info on which novel this is happening or is it everywhere?

Edit: I checked Tales of Demons & Gods and the hr tag comes from the chapter itself. You can add the hr tag to the blacklisted tags and download it manually. Here is an image of my settings

RuthlessRuler commented 5 years ago

Oh. Thanks for the help! I though it was built in function but it doesn't seems so.

In future, do you have any plans to add Webnovel.com support natively?

Flameish commented 5 years ago

No. The ToC page is loaded dynamically after a user clicks on the table of contents tab, its "invisible" to my HTML parser. It might be possible to get them manually with chapter-to-chapter but alot of the chapters are behind a paywall anyway.

RuthlessRuler commented 5 years ago

Yeah. It's same for me. But you can check there are many scripts on Github that are able to Download from Webnovel.com. Try to look into them and see if you can implement them!

RuthlessRuler commented 5 years ago

Also, is it possible to add a native support for MTL websites? Babelnovel.com can be implemented easily but other like comradeo/lnmtl and others have Chinese Words in the same page too(When using Reader Mode).

Flameish commented 5 years ago

I don't know how I feel about bypassing a paywall and If I want to implement that. So webnovel.com is a no for now. Babelnovel is not a static website aswell (or as far as I saw). Maybe you can elaborate how I might be able to add the site easily without having to implement a full support for dynamic websites? I might take a look at that in the future through.

Im not familiar with the term "MTL". Are you talking about Modern Taiwanese Language? I think it would be best if you open a new issue labeled "supported website request" or something like that where you specify which websites you have in mind and I'll see what I can do. Generally, I only add native support for websites which have more than 10 active novels on them, manual grabbing was intended for everything else.

RuthlessRuler commented 5 years ago

I'm not asking to Scrape premium novels. Free novels do exists on Webnovel which can be scraped. Also MTL refers to Machine Translated Novels. Eg. https://lnmtl.com/ & https://comrademao.com/. IDK what are dynamic websits, but if you see that babelnovel has an easy URL of chapters. Like babelnovel.com/novel-name/chapter-xxx/ So if users can define Chapters AND INFORMATION CAN BE SCRAPED FROM THE BABELNOVEL'S WEBSITE!

Flameish commented 5 years ago

I did notice the URL of babel and thought of implementing a third manual grabbing method which works directly with URLs. However, it is still not possible at the moment without big adjustments (Like I said, maybe in the future), because each chapter on Babel is loaded dynamically as well, meaning, the chapter content is fetched, after the page loaded, with a script. Static websites, like for e.g Wuxiaworld, have their chapters fully "inserted" in HTML on the very first page load and I'm able to easily work with that. Until I implement a miniature browser/AJAX scrapper or whatever I really need, dynamic sites wont be possible.

lnmtl.com and comrademao.com will not work "automatically", both will have to use the manual "chapter-to-chapter" method. I'll think about implementing a hybrid of automatic and manual grabbing for sites likes these where you have to input the novel URL, first/starting chapter URL and last/stopping chapter URL (because I don't know how many chapters there are) on the automatic tab.

RuthlessRuler commented 5 years ago

Okay! Thanks a lot!

Flameish commented 5 years ago

You can download from Webnovel.com now (non vip chapters)

RuthlessRuler commented 5 years ago

The Cover images that is downloaded from Webnovel is of very low quality. Suppose if the book is: https://www.webnovel.com/book/8094015805004305/Tales-of-Demons-and-Gods then the cover image that is downloaded is: https://img.webnovel.com/bookcover/8094015805004305/150/150.jpg but the cover like https://img.webnovel.com/bookcover/8094015805004305/600/600.jpg and https://img.webnovel.com/bookcover/8094015805004305/300/300.jpg exists.

Also, the close Button (X) doesnt work in Edit Blacklisted Tags(The Hamburger Menu). Only clicking Ok does the dialog Box closes.

RuthlessRuler commented 5 years ago

Some more stuff: There are actually 2 Same Photos(The Book Cover) with different name which are cover.jpg and 150.jpg(the name of the cover image downloaded as it is) in the EPUB.

The Chapter name is also not there at the start of Each Chapter (Possibly Webnovel Specific) and there is no Numbering of Chapters in TOC too.

Also, the " ' " without space is shown as ’s in the EPUB. Like :

image

Flameish commented 5 years ago

Thanks for your report! I'm sorry for the late response. Check out the latest version, everything should work correctly now. (Except the encoding issue, I'm still working on it and just put a crude hotfix in place for it)

RuthlessRuler commented 5 years ago

Thanks for the update. The issue chapter title missing at start of each chapter is still there in WebNovel Downloads. Also,the cover can be downloaded of higher size (600.jpg) from webnovel

Flameish commented 5 years ago

I've updated the 2.1.4 jar.