dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.43k stars 279 forks source link

Fix this source - https://www.wuxiaworld.com/novel/overgeared #1816

Closed avggeek closed 6 months ago

avggeek commented 1 year ago

Let us know

Novel URL: https://www.wuxiaworld.com/novel/overgeared App Location: PIP App Version: 3.2.0

Describe this issue

As per #1708, Wuxiaworld is now working but I'm still finding that lncrawl is hanging on this novel at about the 8% mark till it is manually killed.

I'm using a Wuxiaworld login that has "Golden Karma" and it seems like the format of the Bearer token is different compared to what is shown in #1360

$ lncrawl --login Bearer hunter2 -s https://www.wuxiaworld.com/novel/overgeared --format epub --filename "Overgeared - rainbowturtle" --filename-only --output . --single
================================================================================
╭╮╱╱╱╱╱╱╭╮╱╭╮╱╱╱╱╱╱╱╱╱╱╱╱╭╮╱╭━━━╮╱╱╱╱╱╱╱╱╱╭╮
┃┃╱╱╱╱╱╱┃┃╭╯╰╮╱╱╱╱╱╱╱╱╱╱╱┃┃╱┃╭━╮┃╱╱╱╱╱╱╱╱╱┃┃
┃┃╱╱╭┳━━┫╰┻╮╭╋━╮╭━━┳╮╭┳━━┫┃╱┃┃╱╰╋━┳━━┳╮╭╮╭┫┃╭━━┳━╮
┃┃╱╭╋┫╭╮┃╭╮┃┃┃╭╮┫╭╮┃╰╯┃┃━┫┃╱┃┃╱╭┫╭┫╭╮┃╰╯╰╯┃┃┃┃━┫╭╯
┃╰━╯┃┃╰╯┃┃┃┃╰┫┃┃┃╰╯┣╮╭┫┃━┫╰╮┃╰━╯┃┃┃╭╮┣╮╭╮╭┫╰┫┃━┫┃
╰━━━┻┻━╮┣╯╰┻━┻╯╰┻━━╯╰╯╰━━┻━╯╰━━━┻╯╰╯╰╯╰╯╰╯╰━┻━━┻╯
╱╱╱╱╱╭━╯┃ v3.2.0
╱╱╱╱╱╰━━╯ 🔗 https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------
Module load failed: /home/avggeek/.lncrawl/sources/en/l/lightnovelworld.com.py | No module named 'lncrawl.templates.novelpub'
Module load failed: /home/avggeek/.lncrawl/sources/en/l/lightnovelpub.py | No module named 'lncrawl.templates.novelpub'
Module load failed: /home/avggeek/.lncrawl/sources/en/n/novelpub.py | No module named 'lncrawl.templates.novelpub'
Module load failed: /home/avggeek/.lncrawl/sources/en/w/webnovelpub.py | No module named 'lncrawl.templates.novelpub'

➡ Press  Ctrl + C  to exit

Retrieving novel info...
https://www.wuxiaworld.com/novel/overgeared
Volumes: 100%|| 35/35 [00:26<00:00,  1.33vol/s]
TITLE: Overgeared
35 volumes and 1740 chapters found
? What to do with existing folder? Remove old folder and start fresh
? Which chapters to download? Everything! (1740 chapters)
? 1740 chapters selected Continue
Chapters:   0%|                                                                                                                                                                                                                                                                                | 0/1740 [00:00<?, ?item/s]
Chapters:   8%|█| 138/1740 [02:58<22:19,  1.20item/s]
Chapters:   8%|█| 138/1740 [09:00<1:44:37,  3.92s/item]
dipu-bd commented 1 year ago

Can you try with v3.2.1?

dipu-bd commented 1 year ago

What is the format of the Bearer token with golden karma? I see that you are using hunter2 as token. This is not a valid jwt string and should not work as a token.

avggeek commented 1 year ago

Can you try with v3.2.1?

I've upgraded to v3.2.2 now but I'm still seeing the same issue and in fact the "hang" happens even earlier than before (Edit: added time output):

$ time lncrawl --login Bearer 9XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX9-1 -s https://www.wuxiaworld.com/novel/overgeared --format epub --filename "Overgeared - rainbowturtle" --filename-only --output . --single
================================================================================
╭╮╱╱╱╱╱╱╭╮╱╭╮╱╱╱╱╱╱╱╱╱╱╱╱╭╮╱╭━━━╮╱╱╱╱╱╱╱╱╱╭╮
┃┃╱╱╱╱╱╱┃┃╭╯╰╮╱╱╱╱╱╱╱╱╱╱╱┃┃╱┃╭━╮┃╱╱╱╱╱╱╱╱╱┃┃
┃┃╱╱╭┳━━┫╰┻╮╭╋━╮╭━━┳╮╭┳━━┫┃╱┃┃╱╰╋━┳━━┳╮╭╮╭┫┃╭━━┳━╮
┃┃╱╭╋┫╭╮┃╭╮┃┃┃╭╮┫╭╮┃╰╯┃┃━┫┃╱┃┃╱╭┫╭┫╭╮┃╰╯╰╯┃┃┃┃━┫╭╯
┃╰━╯┃┃╰╯┃┃┃┃╰┫┃┃┃╰╯┣╮╭┫┃━┫╰╮┃╰━╯┃┃┃╭╮┣╮╭╮╭┫╰┫┃━┫┃
╰━━━┻┻━╮┣╯╰┻━┻╯╰┻━━╯╰╯╰━━┻━╯╰━━━┻╯╰╯╰╯╰╯╰╯╰━┻━━┻╯
╱╱╱╱╱╭━╯┃ v3.2.2
╱╱╱╱╱╰━━╯ 🔗 https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------

➡ Press  Ctrl + C  to exit

Retrieving novel info...
https://www.wuxiaworld.com/novel/overgeared
Volumes: 100%|█| 35/35 [00:26<00:00,  1.33vol/s]
TITLE: Overgeared
35 volumes and 1743 chapters found
? What to do with existing folder? Remove old folder and start fresh
? Which chapters to download? Everything! (1743 chapters)
? 1743 chapters selected Continue
Chapters:   0%| | 0/1743 [00:00<?, ?item/s]
Chapters:   3%|█| 46/1743 [00:48<27:43,  1.02item/s]Failed to get chapter: Message: no such element: Unable to locate element: {"method":"css selector","selector":".chapter-content"}
  (Session info: headless chrome=108.0.5359.98)
Stacktrace:
#0 0x562f990b82a3 <unknown>
#1 0x562f98e76f77 <unknown>
#2 0x562f98eb380c <unknown>
#3 0x562f98eb3a71 <unknown>
#4 0x562f98eed734 <unknown>
#5 0x562f98ed3b5d <unknown>
#6 0x562f98eeb47c <unknown>
#7 0x562f98ed3903 <unknown>
#8 0x562f98ea6ece <unknown>
#9 0x562f98ea7fde <unknown>
#10 0x562f9910863e <unknown>
#11 0x562f9910bb79 <unknown>
#12 0x562f990ee89e <unknown>
#13 0x562f9910ca83 <unknown>
#14 0x562f990e1505 <unknown>
#15 0x562f9912dca8 <unknown>
#16 0x562f9912de36 <unknown>
#17 0x562f99149333 <unknown>
#18 0x7f3183d6eea7 start_thread

Chapters:   6%|█| 112/1743 [10:34<2:33:58,  5.66s/item]
Traceback (most recent call last):
 File "/usr/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt

--------------------------------------------------------------------------------
 🔗  https://github.com/dipu-bd/lightnovel-crawler/issues
================================================================================

real    11m35.282s
user    0m21.201s
sys     0m5.892s

What is the format of the Bearer token with golden karma? I see that you are using hunter2 as token. This is not a valid jwt string and should not work as a token.

Oh boy, nothing worse than a joke that's so old in Internet Meme terms folks don't get it anymore 🤣 . Here's an explanation of hunter2

To answer the question, the bearer token format that I'm seeing is Bearer 9XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX9-1

dipu-bd commented 1 year ago

It seems that the app can not handle too many instances of browser.

Oh boy, nothing worse than a joke that's so old in Internet Meme terms folks don't get it anymore 🤣 . Here's an explanation of hunter2

Wow! I did not know about such meme. Poor AzureDiamond!

avggeek commented 1 year ago

It seems that the app can not handle too many instances of browser.

Are there any parameters in lncrawl today that I can use to limit the # of parallel instances that get created?

alzamer2 commented 8 months ago

hello new update was issued for wuxiaworld.com update your sources and try scraping