Flameish / Novel-Grabber

Novel-Grabber can download novels from pretty much any webnovel and lightnovel site.
MIT License
480 stars 62 forks source link

Two Issues with Manual selection #62

Closed DPalmz closed 3 years ago

DPalmz commented 4 years ago

I have been unable to download with chapter to chapter selection, even after checking that I was doing the correct thing with one of the previous issues here. I get this error. [ERROR]Cannot invoke "org.jsoup.nodes.Element.absUrl(String)" because the return value of "org.jsoup.select.Elements.first()" is null

Second thing is less of a concern, but slightly annoying, after trying to do chapter to chapter, even if I delete the selection, I am unable to do a table of contents manual download; I get the same error, as if it still thinks I am still doing a chapter to chapter.

Flameish commented 4 years ago

I'm currently redoing parts of NG, so I can't really fix this for you right now. Hopefully the new release will have these issues fixed already.

EDIT: Can you link the novel you're trying to download? I'll take a look at it as everything is working as intended when I try on 3.1.1

DPalmz commented 3 years ago

Thank you for replying. It makes sense that you wouldn't bugfix if you're working on a new version. I hope the new one works for me too.

I've tried it with http://www.talesofmu.com/ and https://web.archive.org/web/20200216171325/http://www.addergoole.com/9/ (though I think I only managed to learn how to successfully do it with the first one)

Also, let me say, I'm really glad your program works with web archived stories, cause my god sometimes would I go to read some of these only to find they only exist there now.

Flameish commented 3 years ago

Tales Of Mu has EPUBs available, please think about supporting the author and getting their version! :)

Never thought about using Wayback machine myself but it's great that it works out of the box. I took a look at Addergole and these options seems to work just fine (only for the first 40 chapters or so, not everything got archived after that point): First chapter: https://web.archive.org/web/20160510181758/http://www.addergoole.com/9/2012/09/chapter-1-wylie Last chapter: (Im not sure when exactly it stopped working) Button: a[rel=next]

Flameish commented 3 years ago

Is your problem still persistent in the new version?

DPalmz commented 3 years ago

yes, my issue is fixed, thanks. The new version is a lot smoother too.

DPalmz commented 3 years ago

How would I go about manually selecting the chapter container though? I've run across some sites where the autodetect isn't grabbing the chapter container.

Flameish commented 3 years ago

The chapter container selection is using CSS Selectors to select the correct HTML element. It is pretty easy to understand even without any prior HTML knowledge:

Just right click into the chapter body and select Inspect Element or if right clicking is disabled on the website, open the inspector tool manually; (Firefox: F12, not sure about Chrome) or via the menu -> dev tools -> inspector.

Screenshot from 2020-11-19 11-16-44

Next you have to find the container which contains the chapter text. It's probably the longest one and/or has many <p> (paragraph) element tags.

Screenshot from 2020-11-19 11-18-15 It already displays the CSS selector at the top: div.chapter-inner.chapter-content

Or you could copy it directly if you right click on the container and select Copy -> CSS selector.

Screenshot from 2020-11-19 11-19-35

Don't forget that you can test your selection via the Preview Chapter function based on your input.

Screenshot from 2020-11-19 11-42-15

If you need even more specific control over your selection you can take a look at the Jsoup selector syntax page. Jsoup also has a live testing page which is pretty useful to find the correct selector.

You can also use CSS selectors to remove content from the chapter via the Edit Blacklist tags window.

Screenshot from 2020-11-19 12-06-17 All <p> elements and elements which have the ads class will be removed.

Blacklisted tags will also reflect on the preview window as you can (not) see.

Screenshot from 2020-11-19 12-07-00

DPalmz commented 3 years ago

So I'm getting the error (Cannot invoke "org.jsoup.nodes.Element.absUrl(String)" because the return value of "org.jsoup.select.Elements.first()" is null) It is picking up the text, but for the chapter to chapter selection, when it gets this error for some reason the program just keeps cycling to the next chapter even if said chapter doesn't exist and is past the last chapter given. I tested this with https://www.eviscerati.org/fiction/arbsl/2013/10/rake-starlight-chapter-01/ and https://caelum-lex.com/. I got the same error with both, but did manage to get an epub out of caelum lex since it had a table of contents

Flameish commented 3 years ago

Both worked perfectly fine for me with .nav-next a as the next button. Didn't pay attention on caelum and got stuck in a loop through but rake (it also has a table of contents from what I've seen) finished just fine.

That message sounds like it couldn't get the correct href (or any). Can you post what you've entered?

DPalmz commented 3 years ago

So I've been using opera for this. Maybe a different browser would be better as I never got anything like .nav-next a out of the css selector. I tried a few different combinations of things, both with auto chapter container select and with manual. For Rake by Starlight (yeah, it does look like it has a table of contents, though I didn't know if it would be detected so I didn't try) I've tried #post-6593 > footer > nav > div > div > a and body.post-template-default.single.single-post.postid-6593.single-format-standard.custom-background.wp-custom-logo.wp-embed-responsive.author-hidden:nth-child(2) div.hfeed.site:nth-child(1) div.site-content.container.clearfix section.content-area main.site-main article.post-6593.post.type-post.status-publish.format-standard.has-post-thumbnail.hentry.category-arbsl footer.entry-footer nav.navigation.post-navigation div.nav-links div.nav-next > a:nth-child(1) This long one is what came out of a css addon I got. I did also try a rake of starlight with .nav-next a and was able to download it. So yes, it does seem to be a problem with my inputs.

Flameish commented 3 years ago

Never tried with Opera so it's good to see that it works there! That selector looks horrible lol, they should never be that long. It looks like a unique one too. You might want to take look at the different sources for examples. Search for select inside the files.

DPalmz commented 3 years ago

Thank you for your help

asrind11 commented 1 year ago

A request to the author of the Novel-Grabber program to add support for the site https://ranobe-novels.ru - and not a single bot that I found on the Internet and was able to run can download files from the site ranobe-novels.ru