[Enhancement] Some more parsers

typhoon71 commented 8 years ago

[ ] https://www.epub.pub/
[ ] https://tachibanachinatsu.wixsite.com/tenshitranslations/copy-of-obc-toc
[x] http://www.69shu.com/
[x] unlimitednovelfailures.mangamatters.com
[x] https://creativenovels.com/
[x] dreamsleeper87.blogspot.sg/p/world-customize-creator-web-novel.html
[x] https://fantasy-books.live/
[x] frostfire10.wordpress.com
[x] gravitytales.com
[x] hellping.org
[x] isekaicyborg.wordpress.com
[x] japtem.com
[x] krytykal.org
[x] http://liberspark.com
[x] lightnovelbastion.com
[x] lnmtl.com
[x] moonbunnycafe.com
[x] nanodesutranslations.wordpress.com / www.*thetranslation.wordpress.com
[x] http://www.novelall.com
[x] http://novelfull.com/
[x] http://novelonlinefree.com
[x] https://www.noveluniverse.com/
[x] www.pathoftranslation.com
[x] www.radianttranslations.com
[x] shalvationtranslations.wordpress.com
[x] shikkakutranslations.org
[x] skythewood.blogspot.it
[x] www.sousetsuka.com
[x] www.translationnations.com
[x] volaretranslations.com
[x] https://www.webnovel.com/ aka https://en.qidian.com/
[x] https://wnmtl.com
[x] http://www.wuxiaworld.co/
[x] *.wikipedia.org
[x] www.xianxiaworld.net
[x] yoraikun.wordpress.com
[x] http://zenithnovels.com

Since I can't do them myself, I'll shamelessly ask for some new parsers to be added.

moonbunnycafe.com -> This is a site where novel translations are hosted with TLs "autorization", and there are a bunch of nice ones; it would make a good addiction.

nanodesutranslations.wordpress.com, or better www.thetranslation.wordpress.com (note the "") -> TL, I think this was already asked talked about, but I'll put here since there's a lot of content there.

krytykal.org -> TL, good stuff unlimitednovelfailures.mangamatters.com -> TL, good stuff (but slow updates)

There are lot of other sites I'd like to ask, but I'll refrain since they mostly are 1 project TL each site and it would be a lot of work for little gain.

/me reloads shame concept /me blushes

AzSiAz commented 8 years ago

nanodesutranslations > Might be tricky since it's not the same person who post it but a classic wordpress parser should be able to handle it chapter by chapter, except somme weird case. I don't remember them doing full text no ?

krytykal.org > it seem easy for full text every chapter is inside a div with su-posts su-posts-single-post as a css class. And chapter by chapter will be the same as nanodesutranslations with a classic wordpress parser

unlimitednovelfailures.mangamatters.com > Some weird html here new chapter title is inside previous chapter tag but chapter content is in the new <p> tag, should be somewhat easy

moonbunnycafe.com > Same parser from nanodesutranslations should do it chapter by chapter

I remember seeing something to parse provided chapter url and make a volume but I don't see it Did I dream about this ?

dteviot commented 8 years ago

@AzSiAz

I remember seeing something to parse provided chapter url and make a volume but I don't see it Did I dream about this ?

You're not far off. I'm currently trying to make that logic more general. Have a look at the ZirusMusing and WuxiaWorld parsers for the basic idea.

dteviot commented 8 years ago

@typhoon71

There are lot of other sites I'd like to ask, but I'll refrain since they mostly are 1 project TL each site and it would be a lot of work for little gain.

Please list them. If I'm lucky, it may be possible to generalize and get many of them with a single parser.
I'm currently thinking about a general "wordpress" parser.

typhoon71 commented 8 years ago

Nice to know you're making the addon more powerful. I added some site on the list that appeared in my post; it mostly is wordpress based stuff, but there's blogspot too. I took the liberty to reorder the list alphabetically.

AzSiAz commented 8 years ago

@dteviot Oh right I will get a closer look at those parser thanks :) I think it would be useful for a general wordpress parser since most of them doesn't have a full text page

dteviot commented 8 years ago

@typhoon71 @AzSiAz Added some new parsers. On the ExperimentalTabMode branch. Note, I've only done a quick test on one or two stories per site.

frostfire10.wordpress.com: need to go to web page with list of chapters for it to work. (https://frostfire10.wordpress.com/chapters/)
isekaicyborg.wordpress.com: need to go to page with list of chapters, then select the chapters to pack.
shalvationtranslations.wordpress.com: need to go to page with list of chapters, then select the chapters to pack.
hellping.org: parser shows links for all chapters of all stories, so you need to pick the chapters to pack carefully.
krytykal.org: parser shows links for all chapters of all stories, so you need to pick the chapters to pack carefully.

typhoon71 commented 8 years ago

1) frostfire10.wordpress.com -> OK, works 2) isekaicyborg.wordpress.com -> OK, works, but: -spurious link (http://cont.lanove.kodansha.co.jp) -needs reloading after getting some chapters (like, after getting v1 to getv2) if OS is suspended. 3) shalvationtranslations.wordpress.com -> OK, works, but: -spurious links (http://www.mediafire.com, https://drive.google.com) -I can't find the option to remove duplicates, the images in the gallery contain some that are later shown in the novel -can't choose the cover image from the gallery -default cover image is not resized/centered (well, ok, it's small, but still)

4) hellping.org -> OK, works 5) krytykal.org -> OK, works, but: -I can't find the option to remove duplicates, the images in the gallery contain some that are later shown in the novel -can't choose the cover image from the gallery -default cover image is not resized/centered (well, ok, it's small, but still)

Regarding "parser shows links for all chapters of all stories" for 4) & 5) , it's easy enough to check the right ones, the link suggests what is what; seems clear and easy enough for me. I did not test ALL of the novels, mostly the ones I like.

dteviot commented 8 years ago

@typhoon71

2) isekaicyborg.wordpress.com -> OK, works, but: -spurious link (http://cont.lanove.kodansha.co.jp) -needs reloading after getting some chapters (like, after getting v1 to getv2) if OS is suspended. 3) shalvationtranslations.wordpress.com -> OK, works, but: -spurious links (http://www.mediafire.com, https://drive.google.com)

That's kind of by design. Currently, it's just grabbing all hyperlinks, and I leave it to the user to remove any that are not wanted.

I can't find the option to remove duplicates, the images in the gallery contain some that are later shown in the novel

There isn't one. It's currently only supported for Baka-Tsuki, because I can identify the image gallery there. I can't do that in the general case. That said, might be possible to make an educated guess, if a page contains mostly images.

can't choose the cover image from the gallery

You should be able to supply the URL of the image you want used as cover.

default cover image is not resized/centered (well, ok, it's small, but still)

Odd. It should be. I've wrapped it in <svg> and <image> tags.

typhoon71 commented 8 years ago

With "can't choose the cover image from the gallery" I meant that they doesn't get shown like with baka-tsuki, where you just need to select "use as cover". That said, I am indeed able to supply the URL of the image I want used as cover.

About duplicated images (in dungeon defense): actually the images are shown 2 times, but the epub contains just one image. Does the packer deduplicates them or are they just overwritten?

About default cover, I was wrong, sorry: it works as intended. The thing is I also open the epubs with sumatra to check for if the image link option works, and with it the cover image is like I said; it's sumatra fault, I suppose it's stripping formatting / style... Caligre show the image correctly.

dteviot commented 8 years ago

@typhoon71

With "can't choose the cover image from the gallery" I meant that they doesn't get shown like with baka-tsuki, where you just need to select "use as cover".

That's because with Baka-Tsuki, the whole story including images is on a single page, so the images can be shown at the start. With all the others, the images are known only after loading all the pages.

This has implications for removing duplicate images from gallery. Usually the gallery is at the start, but the images that are duplicated elsewhere are not known until all the rest of the pages are loaded.

I'm thinking that a solution to this problem is move the galleries to the end of the book, rather than have them at the start. Of course, if this is done, then there may be no need to remove the duplicates as the risk of spoilers is reduced.

What do you think?

typhoon71 commented 8 years ago

mmm, kind of hard to answer.

I like the gallery being at the start, whithout duplicates, but not because of spoilers: it's just because it's more similar to the actual "book". Also most epub of novels have the gallery at the start. My choice would be to put the gallery at the start, even with duplicates if it's not possible to remove them.

But I do recognize having the gallery at the end to reduce the risk of spoilers is a valid point.

So... An option for the user between those two? Like "put the gallery at the end if it's not bakatsuki"?

dteviot commented 8 years ago

@typhoon71

Added basic support for

GravityTales
moonbunnycafe.com
nanodesutranslations.wordpress.com (Note, go to one of the www.*thetranslation.wordpress.com pages and run extension.)
japtem.com

toshiya44 You suggested Nanodesu and hellping, so you might like to give them a go.

typhoon71 commented 8 years ago

GravityTales Tested with "The Experimental Log of the Crazy Lich": the title disappears from the chapter text.

moonbunnycafe.com Tested with "The Guild’s Cheat Receptionist" and "Lazy Dungeon Master", seems fine.

japtem.com Tested with "Magi's Grandson" Volume 2: the image gallery is messed up (missing images, duplicate/wrong/ issing images); some spurious text before the chapter ("this chapter was brough to you by....")

Nanodesu Tested with "Fire Girl" and "Sasami-san Gambaranai": spurious "next page before the chapter start and before and after the notes for "Fire Girl", triple separators for both sometime. Here a screenshot of a page where both happen (Fire Girl):

fgsample

Did not test nanodesu much because nanodesu provides pdf/epub after the volume are done and edited (takes time but I'm not in a hurry)

dteviot commented 8 years ago

@typhoon71

GravityTales Tested with "The Experimental Log of the Crazy Lich": the title disappears from the chapter text.

Fixed.

japtem.com Tested with "Magi's Grandson" Volume 2: the image gallery is messed up (missing images, duplicate/wrong/ issing images);

I think the missing/duplicate image is an issue with the page itself. http://japtem.com/projects/magis-grandson-toc/v2_illustrations/ The list of all images is missing image 5 from the slide show, and instead duplicates the 4th image in the show. Note, at current time, parser doesn't vist the slideshow on imagur. It uses the images in the "All Images" section.

some spurious text before the chapter ("this chapter was brough to you by....")

I'll look into that.

Nanodesu Will investigate.

Have added skythewood.blogspot.it to ExperimentalTabMode

typhoon71 commented 8 years ago

GravityTales The chapter title is back, but now there's some unwanted text at the start and at the end of the chapter, like "[Previous Chapter][Table of Content][Next Page]"...

japtem.com Yeah, I noticed the addon was using the gallery (which has wrong images) instead of navigating imgur (where there are the correct images); I suppose they should fix the webpage, but it wouldn't hurt to consider the case of galleries both on local and imgur.

skythewood.blogspot.it Tested with "Altina the sword princess": I noticed that the font (size) of the prologue text and the one of ch1 are different while I'd expect them to be the same; the web pages are like that, so I don't think it's the addon doing that. Would it be possible to have the same font size?

dteviot commented 8 years ago

@typhoon71

japtem.com some spurious text before the chapter ("this chapter was brough to you by....")

That's part of the story text, it's not marked in any way.

GravityTales The chapter title is back, but now there's some unwanted text at the start and at the end of the chapter, like "[Previous Chapter][Table of Content][Next Page]"...

The problem here is that the links are using different URLs than given in the ToC. So parser doesn't know that they're next/previous. e.g.

In ToC, the first chapters is http://gravitytales.com/the-experimental-log-of-the-crazy-lich/elcl-chapter-1/,
but on Chapter 2, the link to chapter 1 is http://gravitytales.com/elcl-chapter-1/

Note, not all chapters are like this, only some at start. Later on they're consistent and the parser is able to remove them. See chapter 3 and 4.

dteviot commented 8 years ago

@typhoon71 shikkakutranslations.org should now be working if you want to give it a try.

typhoon71 commented 8 years ago

shikkakutranslations.org Tested with Kamigoroshi, seems to work perfectly. Thanks.

Since both of the novels translated on this site appear in the chapter list, it takes a lot to remove the unwanted ones (100+): would it be possible to implement something like intervals of chapters to select/unselect? Should I add this request to #73?

dteviot commented 8 years ago

@typhoon71

Should I add this request to #73?

Yes please. I'm currently thinking of opening a multi-line text editor, with a hyperlink entry on each line. The idea is someone can then add, delete, edit, re-arrange, etc the links. i.e. There's a button like "edit chapter links". This replaces the table of links with the text editor. User can then re-arrange the links as I've described, then press the "links as table" to convert back to a table.
This is easiest for me to implement and most flexible for a user, but does require the user to have a minimal knowledge of hyperlinks. Also, there's the possibility that they could mistype something i.e. Not correctly escape a special character. What do you think?

typhoon71 commented 8 years ago

Seems good to me, flexible and not too complicated. Btw, adding suggestion to #73.

dteviot commented 8 years ago

@typhoon71

skythewood.blogspot.it Tested with "Altina the sword princess": I noticed that the font (size) of the prologue text and the one of ch1 are different while I'd expect them to be the same; the web pages are like that, so I don't think it's the addon doing that. Would it be possible to have the same font size?

Have raised new issue. https://github.com/dteviot/WebToEpub/issues/75

typhoon71 commented 8 years ago

"http://www.readlightnovel.com/modern-weapons-cheat-in-another-world": the title disappears from the chapter text (same thing that was happening on GravityTales.

dteviot commented 8 years ago

@typhoon71

"http://www.readlightnovel.com/modern-weapons-cheat-in-another-world": the title disappears from the chapter text (same thing that was happening on GravityTales.

I'm not too worried about this just now. Not all stories on readlightnovel are formatted the same. (For example, AccelWorld is encrypted/scrambled to make it difficult to parse.)

Have added a "default parser" that is used when plug-in can't identify parser to use to ExperimentalTab branch. Parser is very crude. It looks for the element you tell it to look for: Either <body>, <div> or <article> You can specify the <div> and <article> tags slightly better. Options are

First Element of that type.
Element with specified class or id.
Elelement with class or id that starts with specified string.

It defaults to grabbing the entire <body> element.

Please take it for a spin. Caution: I'm not sure how good the error handling is if you tell it to look for a content element that is not present.

It works for unlimitednovelfailures.mangamatters.com if you set the element to look for to <div> "class starts with entry" Note, this site has entire stories as single page. The default parser will not split a page into chapters.

dteviot commented 8 years ago

@typhoon71

I'm not too worried about this just now. Not all stories on readlightnovel are formatted the same. (For example, AccelWorld is encrypted/scrambled to make it difficult to parse.)

Well, this is embarrassing. Looks like I never added the chapter titles because I didn't think they added anything. They were just "EPUB Name" - Volume X - Chapter Y. However, on the basis you believe this is useful, title is now added to each chapter.

toshiya44 commented 8 years ago

Issues in nanodesu parser: Sometimes the full size image is not downloaded, for example https://bibliathetranslation.wordpress.com/volume-2/chapter-1/ In the epub I saw,

    <img src="../Images/0004_00004.jpg" alt=""/>
    <!--  https://bibliathetranslation.files.wordpress.com/2016/01/00004.jpg?w=774  -->

Same for the image at https://bibliathetranslation.wordpress.com/volume-2/ However the images at https://bibliathetranslation.wordpress.com/volume-2/color-art/ were downloaded at full size.

Also, the TN notes are broken, but I guess there's no easy solution for that...

hellping.org parser seems to be working fine.

typhoon71 commented 8 years ago

However, on the basis you believe this is useful, title is now added to each chapter. Thanks, it works. I do find them helpful, even if I have to edit them a bit (sometime, np).

unlimitednovelfailures.mangamatters.com This site loads a list of link, but doesn't work and doesn't pack, complaining about a html when it expected an image (or more like I don't get how to set the stuf you talked about). The site "works" like this: one link is a full volume (complete or ongoing), so for example "http://unlimitednovelfailures.mangamatters.com/risou-no-himo-seikatsu/risou-no-himo-seikatsu-volume-03/" contains the whole volume 3 with pics. The post just point to the correct position inside the html. Also in that html there's a TOC in form of liks at the start. I don't think it's standard?

Will have to test more the general parser later on.

dteviot commented 8 years ago

@toshiya44

Sometimes the full size image is not downloaded, for example https://bibliathetranslation.wordpress.com/volume-2/chapter-1/

That's because of the information in the file. For the above file, the only image information is this:

<img class="alignnone size-full wp-image-481" src="https://bibliathetranslation.files.wordpress.com/2016/01/00004.jpg?w=774" alt="00004">

For the the full size images the image information is

<a href="https://bibliathetranslation.files.wordpress.com/2016/01/00001.jpg" rel="attachment wp-att-479">
    <img class="alignnone size-full wp-image-479" src="https://bibliathetranslation.files.wordpress.com/2016/01/00001.jpg?w=774&amp;h=1106" alt="00001" width="774" height="1106">
</a>

Note that in this case the <img> is enclosed in a <a>. When the parser sees this sort of thing, it will try to fetch the <a> as the full size image. What I should probably do is remove the "w" and "h" query values from the url. This is on my todo list. Refer https://github.com/dteviot/WebToEpub/issues/74

Also, the TN notes are broken, but I guess there's no easy solution for that...

Can you tell me what you mean in more detail please? Looking at https://bibliathetranslation.wordpress.com/volume-1/chapter-3-vol-1/ there are no hyperlinks between the footnotes and the text they refer to.

@typhoon71

complaining about a html when it expected an image

OK, that's related to the issue I described above to toshiya, when an image is wrapped in a hyperlink, the parser assumes the link is to the full size image. In this case, it's not a link to the full size image, but to the main web site. Note that when this happens, it will not (or it should not) prevent the page being packed into an EPUB.

I don't get how to set the stuf you talked about

On the popup dialog, 6th line from top, just above the “pack epub” button and below the “Filename” control. There's a line starting with “Element with Chapter Content”, then a drop down with <body>, <article>, <div>, then a drop down with “First Found”, etc. then a text box.

You use these to specify the element that holds the content for a page. For each page in the links, the parser will fetch the page, then extract the element you specify with the above settings. As you didn't change the settings, it will have grabbed the <body> element. i.e. pretty much the entire page. (Side note, obviously these settings are not sufficiently obvious. I guess I'll need to have them on a separate dialog as part of the open sequence.)

This is probably why you got the warning about the image. There is an image that links to the main site on the page, but it's outside the story content element. i.e. this

<a href="http://unlimitednovelfailures.mangamatters.com/">                  <img src="http://unlimitednovelfailures.mangamatters.com/wp-content/uploads/2013/06/011.gif" alt="Unlimited Novel Failures" id="logo"/>
                </a>

If you'd extracted the story content element, you would have skipped that link.

but doesn't work and doesn't pack,

Odd, it works for me. My steps.

In chrome browse to http://unlimitednovelfailures.mangamatters.com/risou-no-himo-seikatsu/risou-no-himo-seikatsu-volume-03/
Click on WebToEpub to open the tab.
Click OK on the warning.
Set first drop down to <div>
Set second drop down to “Class starts with”
Set text to “entry” (ignore the quotes)
Set “Cover Image URL:” to http://unlimitednovelfailures.mangamatters.com/wp-content/uploads/2015/05/img004b.jpg
Unselect all links except for http://unlimitednovelfailures.mangamatters.com/risou-no-himo-seikatsu/risou-no-himo-seikatsu-volume-03/
Click “Pack epub”

The site "works" like this: one link is a full volume (complete or ongoing), so for example "http://unlimitednovelfailures.mangamatters.com/risou-no-himo-seikatsu/risou-no-himo-seikatsu-volume-03/" contains the whole volume 3 with pics.

Yes, I realize that.

The post just point to the correct position inside the html. Also in that html there's a TOC in form of liks at the start. I don't think it's standard?

OK, I think I see at least part of the confusion. The default parser is very simple, it just grabs each link it's given, extracts the specified content element and writes it as a “chapter” to the EPUB. So

You need to prune the set of links you give so that there's only 1 per HTML page you want to fetch. i.e. you need to ignore the links that are to the actual chapters on the page
The parser will fetch the entire volume as a single chapter. You will need to split it (and remove anything you don't want.) However, it will fetch the images and adjust the links.

typhoon71 commented 8 years ago

OK, I've seen the light! I had to clean the cache to have the dropdown show itself: now I do get it... and I can pack the epub(s), yay! It seems to be working fine here too. BTW, could you add a checkbox to disable the cover image custom link usage? Easier than deleting. ;)

toshiya44 commented 8 years ago

On this page https://bibliathetranslation.wordpress.com/volume-2/prologue-vol-2/ , there's a reference link, but it's formatted weirdly. And it appears like this in the epub:

ref

dteviot commented 8 years ago

@toshiya44

Sometimes the full size image is not downloaded, for example https://bibliathetranslation.wordpress.com/volume-2/chapter-1/

Fix has been checked into Experimental Tab branch.

Also, the TN notes are broken, but I guess there's no easy solution for that...

Interesting. Looks like there's a link from the text to the footnote, but no link from footnote back to text. I've have a least fixed the link to the footnote by changing the link href to be just have the fragment of the URL. (Again on Experimental Tab branch) e.g.

<a href="#PTL1"><sup>1</sup></a>

This was another known issue. Item 3 on https://github.com/dteviot/WebToEpub/issues/64

IcePhantom22 commented 8 years ago

Would it be possible to add http://www.sousetsuka.com/ and https://yoraikun.wordpress.com

dteviot commented 8 years ago

@IcePhantom22

Would it be possible to add http://www.sousetsuka.com/ and https://yoraikun.wordpress.com

Yoraikun can already be done using the Default parser. See detailed notes below.

sousetsuka can't be done with by Default parser in the plug-in version that's currently in the chrome store, but can be done with development branch https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode.
If you don't want to wait for the Chrome store version to be updated, you'll need to download the development branch and install from source. Instructions on how to do this can be found at the above link.

More detailed notes.

https://yoraikun.wordpress.com

This site has multiple stories. The table of contents for each story is easy to locate. Sampling the first and last chapter: the content of each chapter is in a <div> element, with a className of "entry-content"

Steps

Browse to web page with list of chapters. e.g. https://yoraikun.wordpress.com/tlk-chapters/
Click on WebToEpub
You will get a warning message that Default parser will be used. Click OK and WebToEpub will open.
In the list of chapters to fetch, uncheck the chapters you don't want. (The default parser will show every hyperlink it finds on the page.)
Above the "Pack EPUB" button, there's a drop down that shows "<body>", change this to <div>, in the drop down to the right, change "First Found" to "Class is", then in edit control on line add "entry-content". (Don't include the quotes)
Correct the "Cover Image URL" value.
Click "Pack EPUB" button

That said, yoraikun.wordpress.com seems to be using wordpress format.So, you could add a mapping for it to WordpressBaseParser.js

parserFactory.register("yoraikun.wordpress.com", function() { return new WordpressBaseParser() });

http://www.sousetsuka.com/

This site seems to have a single story "Death March kara Hajimaru Isekai Kyousoukyoku". The table of contents appears to be at http://www.sousetsuka.com/p/blog-page_11.html Sampling the first and last chapter: the content of each chapter is in a <div> element, with a className starting with "post-body"

Therefore, the steps to use the Default parser from the experimental branch are:

Browse to web page with list of chapters. e.g. http://www.sousetsuka.com/p/blog-page_11.html
Click on WebToEpub
You will get a warning message that Default parser will be used. Click OK and WebToEpub will open.
In the list of chapters to fetch, uncheck the chapters you don't want. (The default parser will show every hyperlink it finds on the page. Due to order of links, you will need to clicking the "Edit Chapter URLs" button to edit the list of links you want used.)
Above the "Pack EPUB" button, there's a drop down that shows "<body>", change this to <div>, in the drop down to the right, change "First Found" to "Class starts with", then in edit control on line add "post-body". (Don't include the quotes)
Click "Pack EPUB" button

That said, sousetseka seems to be using blogspot format.So, you could add a mapping for it to BlogspotParser.js

parserFactory.register("sousetsuka.com", function() { return new BlogspotParser() });

typhoon71 commented 8 years ago

I am trying to save as epub from "https://translationchicken.com/2016/09/21/rezero-web-novel-fan-translation-table-of-contents/", but it fails on Firefox 49.0.1 and last avail addon build (0.0.0.021). It seem sto be working fine on Chrome with ExperimentalTabMode brach. Is some stuff not merged yet? (btw, I just started using Firefox and webtoepub again)

dteviot commented 8 years ago

@typhoon71

Is some stuff not merged yet?

Chrome store version is currently 0.0.0.23. Experimental branch is 0.0.0.24. So, yes. (See https://github.com/dteviot/WebToEpub/commits/ExperimentalTabMode for history.) That said, 0.0.0.21 on Firefox 48.02 works for me, at least for the first two chapters.
Note,

You do get at least one warning message.
The "Save to File" dialog takes a while to come up after the progress bar finishes, as it takes a while to fetch all the images.

Please described the problem in more detail.

Exactly what steps did you do.
What happened, compared to what you expected to happen.

Quaturn commented 8 years ago

OK I will put it out there :-) would it be possible to create parser for http://lnmtl.com/ ?

typhoon71 commented 8 years ago

Here:

I go to "https://translationchicken.com/2016/09/21/rezero-web-novel-fan-translation-table-of-contents/"
I click on the webtoepub toolbar button
I get "No parser found for this URL. Default parser will be used. You will need to specify how to obtain content for each chapter." [which is OK]
I hit OK, so the general parser starts grabbing.
I select just 2 links, hit "Pack epub".
I instantly get the "NetworkError when attempting to fetch resource." message.
I click OK.
BAck to the main tab of the addon, the "Pack epub" button is greyed out.
I see a really small epub is saved (4kb).
I don't see traffic on the network.

I will wait for Firefox to get a newer version and check again.

dteviot commented 8 years ago

@typhoon71

I will wait for Firefox to get a newer version and check again.

v0.0.0.23 is now available in Firefox store. Side note, their approval process has gotten a lot faster. (Or the auditor know recognises my plug-in.)

dteviot commented 8 years ago

@typhoon71

I instantly get the "NetworkError when attempting to fetch resource." message.

Strange. Error message comes from Firefox, not liking the URL it has been asked to fetch before even trying to get it. I'll need you to get me more information

Set up WebToEpub to fetch the two chapters again, but before you click "Pack EPUB"

Right click on the WebToEpub window and select "Inspect Element" on the menu that appears. (Don't select "Inspect Element with Firebug")
On stuff that appears at bottom of screen, click on the "Console" menu item
Click on "Pack EPUB" button
Send me contents of the "Console" window.

Also, you might like to select the "Network" tab on the "Network" menu item, and click "Pack EPUB" again. This will show you the network calls that are made. (You can also do this with normal EPUB creation, you'll see progress of every file request, it's pretty awesome.) Unfortunately, this is cleared when the "Download" dialog appears. So may be helpful to log to a file. https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging. (Note, if nothing appears in network, and it may not if Firefox is rejecting the attempt before it even tries to fetch it, there's no need to get this log.

dteviot commented 8 years ago

@Quaturn

would it be possible to create parser for http://lnmtl.com/ ?

I'll see if it can be done. From a quick inspection, looks like there's complications.

typhoon71 commented 8 years ago

I tried to get the output of the console, but it's empty. I do get the error but there's nothing there (btw, I don't have firebug).

I may have found something tough: if I whitelist the addon window in ublock origin [whitelist entry is "46fb073d-e454-4740-8e95-64f33addd34a.moz-extension-scheme"] the epub is created fine; this is strange because in chrome I didn't need to whitelist epubtopdf.

Also, when packing the epub with the addon whitelisted, the console comes to life, but the output disappears as soon as the epub is saved. So I think ublock origin is blocking the network requests in Firefox; but it doesn't this in chrome (even if I have ublock origin websocket installed there).

So, should I file a bug on Ublock origin github? What kind of bug should it be?

dteviot commented 8 years ago

@typhoon71
Sorry for delay in responding. Hard Drive on my home PC failed. Have to write this from work. Will probably be 3 to 4 weeks before I get new machine and dev resumes.

So, should I file a bug on Ublock origin github

Please do.

What kind of bug should it be?

I don't know what the alternatives are. I personally would go to https://github.com/gorhill/uBlock/issues click on the "issue" button and fill in using using the following guidelines. http://www.drmaciver.com/2013/09/how-to-submit-a-decent-bug-report/

(Hint, you can probably copy/paste most of your previous posts.)

typhoon71 commented 8 years ago

OK, I know of ublock origin github, I already used it. If there's nothing specific to add then no problem, I'll do it (almost) now.

Sorry to ear you PC has failed, I know what it means... We'll wait eagerly for your return. XD

dteviot commented 8 years ago

@Quaturn

would it be possible to create parser for http://lnmtl.com/ ?

Basic support has been added to the this branch. https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode You'll need to download and install from source. (Instructions are on the readme at above URL.) Note, at current time, it will only find the 50 chapters that appear on the index page. I haven't yet figured out all the REST calls that need to be made to fetch a list of all chapters.

dteviot commented 8 years ago

Hmmm. A whole bunch of parsers. https://github.com/JimmXinu/FanFicFare/tree/master/fanficfare/adapters And all I'll need is a JavaScript interpreter for Python.

Quaturn commented 8 years ago

@dteviot First thank you very much for LNMTL it is working great as for adding more links I was able to work around it by copying the generated links to notepad++ and then using column editor

There is one thing however, would it be possible to add chapter title from web page to the chapter body?

The reason is that when generating epub with too many xhtml ereaders have problem to load the epub and when merging the file (in my case every fifty chapters) there is no chance of recognizing which chapter i am reading as many pages do not have chapter number in the body of the text

dteviot commented 8 years ago

@Quaturn

would it be possible to add chapter title from web page to the chapter body?

I'll see what I can do.
I haven't finished working on the LNMTL parser.

dteviot commented 7 years ago

@Quaturn Have updated LNMTL on ExperimentalTabMode branch to add chapter titles. It also makes REST calls to retrieve the full set of chapters. (This can take 10 seconds or more.) Finally, it makes a best guess at image (if any) to use for cover. Please let me know how it works for you.

Quaturn commented 7 years ago

@dteviot Maan you've done it again it works beautifully/perfectly. Thank you very very much :-D

belldandu commented 7 years ago

@dteviot

Hmmm. A whole bunch of parsers. https://github.com/JimmXinu/FanFicFare/tree/master/fanficfare/adapters And all I'll need is a JavaScript interpreter for Python.

;) you have one right here

typhoon71 commented 7 years ago

Just a little bump for the "unlimitednovelfailures.mangamatters.com" parser, the last of 22 entries. ;)

In all honesty, I noticed it wasn't done... because I was "cleaning up" LNs backlog, and found myself thinking "why not making epubs of the novels from UNF myself?".

Side note: the novels is on a single html, with link to sections. I suppose the additional work would be for the illustrations, which are on another page.

They just changed theme btw.

EDIT: why, why is the "close (issue)" button just where I'd expect the "comment" one, why? XD

dteviot / WebToEpub

[Enhancement] Some more parsers #71