Closed xypha closed 2 weeks ago
Malware alert - file was not opened
@xypha lucky you...
@gamebeaker At this point we can be fairly confident that zawa999 is violating GitHub's Terms of Service - before deleting comments in the future, it's probably worth reporting their content for chance at a full IP ban. I'd do it, but you seem to find & delete them before I see them xD
As for the mentioned issue, I've actually been thinking of something similar for all Xenforo forums, mostly for the perspective of bulk threadmark download through "Reader Mode", however it would work similarly for this case as well. One issue: it runs into a few faults - mostly with WebToEpubs indexing logic - e.g. Each chapter link is pre-defined before generation begins, which would be impossible under this paging structure. That can theoretically be worked around, but even if it can, it won't work the exact same as other sites.
I'll look at a potential solution on this but if one is possible, it will likely require configuration in [WebToEpub > Advanced Options > Manually Select Parser] to differentiate it from the standard, unless someone has a better idea for handling this case.
@Kiradien Some notes
I don't understand Scenario 3
As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4
I don't understand Scenario 3
As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4
Yeah, this enhancement is entirely edge-cases; I understand why you're confused, it's also why I will not add these fixes to the main parser. A number of things are happening here, but it's mostly just that the author didn't threadmark his chapters. This is not a failing of WebToEpub's current design for Xenforo, but a general work-around that is actually useful in other cases.
The UI is also different for this archive page, normally paging isn't really needed for threadmarks... it's a really odd edgecase.
Some notes of my own: I wouldn't normally consider this type of enhancement, it's only because of "Reader Mode" allowing retrieval of multiple chapter simultaneously that I'm working on it... It can be handy to download these books a bit quicker with less strain on the server side. It's also a fair bit of fun to dig into elements I don't usually touch.
@Kiradien To clarify further on Scenario 3: my intention was to suggest exporting non-chapter posts and comments... sometimes, reading non-threadmark posts (i.e., user comments, speculation/theory crafting and author's responses) is helpful or just plain fun. An option to export all posts in a thread to epub for easy reading would be nice.
@xypha No worries, that is actually what I'm working on. Just taking time since I'm poking around elements I don't usually touch in my free time. It might end up being a bit buggy on chapter titles (Since the title is usually pulled from the 'threadmark'), but the goal should be feasible... Just a bit slower to release than most patches I work on.
My comments about 'Reader Mode' is simply because that is what I will personally use it to export, no intent to make it exclusive to that.
Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker). For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.
Hi. I made a CLI tool for adding images to FicHub here. You'll have to install python to use it though
Sorry for the delay on this; was working on it on and off and was a little too intent on a 'perfect' solution. PR uploaded with a working solution - it's not the perfect solution I wanted, all posts on each QQ 'page' are corelated to a single chapter, but it does the job.
I'll push the PR through once the issues are resolved
Trying to make each post a chapter with the current setup of web2epub is a bit too much of a nightmare.
In order to use the new parser, you need to open up advanced options and select the "Xenforo Batch Post Parser" under manual parsers.
Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes.
@Kiradien This works for With This Ring.
Thank you!
Saw a bunch of errors - mostly about fetching images, but also others.
No complaints though. THANK YOU! this is what I wanted.
Just going to share the errors here in case they might be relevant.
several others once the epub was downloaded -- see attached text file (too long to post directly in the comment)
403 errors (3 in total for different domains) that I had to click on skip to complete the epub download.
Example :
WARNING: Site '1.bp.blogspot.com' has sent an Access Denied (403) error.
You may need to logon to site, or browse site normally
until you get a Cloudflare "Are you a human" page or satisfy some other CAPTCHA
before WebToEpub can continue.
Fetch of image 'http://1.bp.blogspot.com/_M7D1hE_0cz0/S9GqWbJ-0pI/AAAAAAAADLk/AUuEqBBzDCE/s1600/GL4602.jpg' for page 'https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/page-53' failed with network error 403. This is an intermittent error. If you retry in a few minutes, it may succeed. promptUserForRetry@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:57:19
onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:48:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*retryFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:77:27
async*onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:40:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*wrapFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:157:27
fetchImage@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:335:40
fetchImages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:108:28
async*fetchImagesUsedInDocument/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:545:44
promise callback*fetchImagesUsedInDocument@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:543:14
fetchWebPageContent/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:528:31
promise callback*fetchWebPageContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:518:59
async*fetchWebPages/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:69
fetchWebPages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:41
async*fetchContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:463:21
fetchContentAndPackEpub@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:153:16
EventHandlerNonNull*addEventHandlers@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:464:9
window.onload@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:584:13
EventHandlerNonNull*main<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:579:5
@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:598:3
@xypha
I had a quick skim through them. All I saw were WebToEpub reporting it was unable to retrieve an image. (So you know it won't be in the epub, and it's not WebToEpub's fault.)
e.g. http://static.comicvine.com seems to be down/gone 404 errors speak for themselves. etc.
@xypha
Updated version (1.0.1.0) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours (typical) to 21 days.
My thanks again to @Kiradien for his hard work
Thank you!
Problem Non-threadmark posts cannot be exported.
Steps to replicate:
WebToEpub
from toolbar icon → only 1 chapter is loaded. None of the threadmarks on the page are seen.WebToEpub
from toolbar icon → only 25 chapters are loaded.WebToEpub
from toolbar icon → all 148 chapters are loaded, but non-threadmark posts cannot be exported.Describe the solution you'd like
Possible solution to Scenario 1 and 2:
WebToEpub
popup tab, add warning text (maybe above the Chapters Count), telling users to ensure all threadmarks are loaded before export.Possible solution to Scenario 3:
Describe alternatives you've considered Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker). For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.
Additional context Current version: 0.0.0.167 Browser: Firefox 129.0.2 (64-bit) OS: Windows 11 23H2