JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
739 stars 156 forks source link

Problems with proper parsing\rendering of I Shall Seal the Heavens on wuxiaworld. #211

Closed GravityHug closed 6 years ago

GravityHug commented 6 years ago

(re-posting from the forum)

Just wanted to report two possible bugs of Fanficfare, both occurring during story-download from wuxiaworld.com (wuxiaworld.com/issth-index/, in this case).

  • the compiled document for the fully-downloaded story of “I Shall Seal the Heavens” seems to get formatted incorrectly because when you open the file in the browser (Firefox, Chrome), only the table of contents gets displayed — the chapters themselves are being inaccessible. One way to access the chapters is by opening the file through MS Word instead of browsers, but that’s more of a workaround.

  • footnotes and hyperlinks inside story chapters seem to be rendering incorrectly

    • The footnote indicators inside paragraphs (¹, ², ³, etc) get replaced by line breaks, while the footnotes themselves just get posted at the end of the chapter.
    • hyperlinks and the hyperlinked texts get removed entirely.

These hold true for the latest test version (<2017-07-30).

JimmXinu commented 6 years ago

Going to wuxiaworld.com/issth-index/ (in browser) I don't see any chapters at all. It looks to me like that story on the site is broken. So it's not a surprise that it doesn't download properly. I'm not sure what you mean by opening in MS Word, etc. I cannot download chapters for that story at all because they do not appear.

Can you provide a link to different story that shows footnote/link issues?

I'm not sure about foot notes, but it looks like @gcomyn used a brute-force technique to remove prev/next chapter links, namely, removing all links.

JimmXinu commented 6 years ago

Today I am able to see the chapters for that story in the browser. I am able to download it (1600+ chapters?!) and it looks fine to me.

Regarding your second point, I do see "click here for soundtrack" links being removed--I'll change the code mentioned earlier. But I'll need to know a couple chapters with footnotes to see how they behave.

GravityHug commented 6 years ago

Sorry for the somewhat delayed reply!

it looks fine to me

Strange, in my version of the composed HTML file only the chapter ToC is available. The chapters themselves are also in the html files, but something should be wrong with the html structure because they are not being rendered visible in browsers. Here’s an example of such a faulty rendering for the first 10 chapters downloaded. Clicking the chapter titles also does nothing.

I'm not sure what you mean by opening in MS Word

Just choosing in MS Word: File→Open→[...\Calibre...\I Shall Seal the Heavens (_) (Ch 1-10)-Deathblade-....html]. I reckon since Word is not a full-scale HTML-viewer, whatever is wrong with the HTML code that prevents the chapter contents being seen in browsers isn’t being a problem in Word.

Can you provide a link to different story that shows footnote/link issues?

But I'll need to know a couple chapters with footnotes to see how they behave.

Chapters 1, 5, and 10 have such footnotes and links, as an example. By the way, is there a way to make Calibre convert these footnotes into more proper HTML footnote formatting (e.g. id="footnote", backllinks, etc)? So that it would be easier to later bulk-convert them into DOCX, EPUB, or FB2 footnotes?

1600+ chapters?!

Haha, yes. Many Asian webnovels are like that, spanning over 1,000–3,000 chapters of more “zoomed-in” storytelling. When I was testing the bugs before posting about them, all of them were noticeable even when only the first 10 chapters were downloaded (http://www.wuxiaworld.com/issth-index/[-10]) though, so downloading the whole time each time should not be needed.

JimmXinu commented 6 years ago

You are downloading to HTML format apparently. FYI, I assume everyone downloads to EPUB unless explicitly stated otherwise. As a general rule, I recommend for all users to download to EPUB (for updates) and convert to other formats.

I'm not seeing any problems with downloaded HTML in calibre's built-in viewer, nor from download epub converted to html in either. Nor with HTML direct download in CLI.

But I can confirm that when downloaded to HTML directly in Calibre, Firefox doesn't correctly render the HTML. It appears to be an issue with the iframe surrounded videos included in the story source. Unless you have other examples, I assume this is specific to this story, not a general problem.

So my suggestion is the same as usual: download to epub so you can update with new chapters, convert to other formats as needed.

Links and footnotes in chapters are now working as far as I see. I don't have any interest in making them more complex.