dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
678 stars 132 forks source link

Kakaoparser doesn't work in PG 15 novels. #648

Closed liony0 closed 2 years ago

liony0 commented 2 years ago

Describe the bug 'https://pagestage.kakao.com/' I tried to extract the epub file from the above site with the added version of kakaoparser, but found that the program error occurred in the PG 15 novels on the above site.

To Reproduce

  1. Go to 'https://pagestage.kakao.com/novels/95593310'

  2. In order to read PG 15 novels, you must log in to the site's Adult Verified ID. So I made a temporary ID for you.

  3. Launch WebToEpub.

  4. Error.

Error: Fetch of URL 'https://api-pagestage.kakao.com/novels/16625636' failed with network error 403. at FetchErrorHandler.onResponseError (chrome-extension://knkobmaemfjhaigiabijmncandjdblli/js/HttpClient.js:32:25) at Function.checkResponseAndGetData (chrome-extension://knkobmaemfjhaigiabijmncandjdblli/js/HttpClient.js:166:45) at chrome-extension://knkobmaemfjhaigiabijmncandjdblli/js/HttpClient.js:160:31 at async KakaoParser.getChapterUrls (chrome-extension://knkobmaemfjhaigiabijmncandjdblli/js/parsers/KakaoParser.js:82:21)

'https://pagestage.kakao.com/novels/16625636' Since the same error occurs at the above address, I am guessing that an error appears in all PG 15 novels.

Desktop (please complete the following information):

dteviot commented 2 years ago

@eeeonoo Sorry, I can't seem to access the site: image

liony0 commented 2 years ago

@dteviot I think that login is possible only in South Korea. Can you try after setting the IP to South Korea using vpn?

dteviot commented 2 years ago

@eeeonoo Sorry, I don't have a VPN. What we could try is capture the network activity for getting chapters from a non PG-15 and a PG-15 story and see if there's any obvious difference.

Basic steps.

  1. Go to Table of Contents for a not PG-15 story

  2. Open WebToEpub

  3. Open Developer tools

  4. Go back to WebToEpub, select 2 chapters and pack to an epub file.

  5. Go back to Developer tools, go to the network tab, and then save all as HAR file with content.

  6. Then go to PG-15 story and repeat steps above steps

  7. Zip up HAR files and put them somewhere I can get them. More details: https://www.inflectra.com/support/knowledgebase/kb254.aspx

Note: I will be unavailable for the next few days. So, don't expect an immediate response.

liony0 commented 2 years ago

@dteviot

https://we.tl/t-O6TXv4jQiz

I obtained 2 HAR files according to the above process. However, in PG-15 story, pages selecting chapters are not accessible. Pressing the webtoepub icon immediately leads to the error page as shown in the picture below.

화면 캡처 2021-11-30 184123

Synteresis commented 2 years ago

@eeeonoo

I have been busy for the past few days, probably ongoing to the end of the month.

I can try to work on it, but when I tested the login, it said the account was not verified.

Also, it is probably better not to have the login public if it is connected to your korean id. Maybe email it or create a private GitHub gist.


@dteviot

I can access the site and login after the identity verification. I will try to work on it in the coming days.

A wild guess, since I don't particularly remember, the fetchJson might not include the cookies. I'm not sure.

Never mind. Line 152 in HttpClient includes credentials.

I did have to click a few additional buttons when on the webpage, but if it calls the api all the same, it should be a quick fix.

I'll look through previous commits when I work on it, but how do you name commits that fix an existing parser?


Notes

ageLimit query in browser set to 15. Soft or Hard?

Can I include it in the api call to bypass adult check entirely? Does this count as bypassing paywall...?

Was not needed. Passed fetch with additional authorization token.

Login requires reCaptcha check.

liony0 commented 2 years ago

@Synteresis I changed my account password and lifted the login country restriction. If you need this account, please let me know your email.

dteviot commented 2 years ago

@Synteresis

you name commits that fix an existing parser?

Usually in terms of the site. e.g. Add support for PG-15 chapters from Kakao.com

Note, I haven't always followed my own rules. But I try to learn and improve. In early days, I referred to parsers. But I think site is better.

e.g.

Add KakaoParser

vs

Add site https://Kakao.com

Which makes even more sense if site was added by modifying a different parser (e.g. one of the Generic parsers.)

Synteresis commented 2 years ago

@eeeonoo

You can reach me at ssngithub@hotmail.com

liony0 commented 2 years ago

@Synteresis I sent you e-mail.

Synteresis commented 2 years ago

@eeeonoo

Thanks. I think I've figured out how to fix it.

Work During Development ----- Known Bugs on Current Solution Starting on not the first page of the table of contents page will break looping through json.content because of url appending with page query. ----- Notes on How to Fix When getting chapter URLs, if first fetch fails, send post request to login for accessToken and append to dom. If doesn't exist, do not append to dom. When fetching chapters, check for this element existence, if null, don't include authorization, otherwise include accessToken as authorization. ----- Note to Self Include authorization token in the request headers. image There is a xhr request to https://api-pagestage.kakao.com/users/login which returns an accessToken in a dictionary which is equivalent to the authorization token needed in the request header. ({"accessToken":"..."}) ``` await fetch("https://api-pagestage.kakao.com/users/login", { "credentials": "include", "headers": { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:94.0) Gecko/20100101 Firefox/94.0", "Accept": "application/vnd.stage.v1+json", "Accept-Language": "en-US,en;q=0.5", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-site" }, "referrer": "https://pagestage.kakao.com/", "method": "POST", "mode": "cors" }); ``` Message when not logged in. ``` Response { type: "cors", url: "https://api-pagestage.kakao.com/users/login", redirected: false, status: 400, ok: false, statusText: "Bad Request", headers: Headers, body: ReadableStream, bodyUsed: false } ​ body: ReadableStream { locked: false } ​ bodyUsed: false ​ headers: Headers { } ​ ok: false ​ redirected: false ​ status: 400 ​ statusText: "Bad Request" ​ type: "cors" ​ url: "https://api-pagestage.kakao.com/users/login" ​ : ResponsePrototype { clone: clone(), arrayBuffer: arrayBuffer(), blob: blob(), … } ```
Synteresis commented 2 years ago

@eeeonoo

I believe it is in working order. Although I do not recommend using my fork. It is a couple commits behind because I messed up my git.

liony0 commented 2 years ago

@Synteresis It works successfully. Thank you so much!

dteviot commented 2 years ago

@eeeonoo @Synteresis Have merged Synteresis's code (thank you very much) and built new test version.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes.

liony0 commented 2 years ago

@dteviot It also works successfully. Thanks.

dteviot commented 2 years ago

@eeeonoo Reopening, so I know to notify you when version in Chrome and Firefox stores is updated.

dteviot commented 2 years ago

@eeeonoo, @Synteresis

Updated version (0.0.0.145) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in 1 to 3 weeks.