dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.46k stars 288 forks source link

www.wuxiaworld.com, not working in google colab #1701

Closed budikesuma closed 2 years ago

budikesuma commented 2 years ago

Let us know

Novel URL: https://www.wuxiaworld.com/novel/a-will-eternal

App Location: Google Colab App Version: 3.0.1

Describe this issue

Still can't find the chapters.

. Screenshot_20221010-020349_Browser

Screenshot_20221010-020405_Browser

Screenshot_20221010-020422_Browser

dastrdly6585 commented 2 years ago

With a different app type (EXE, version 3.0.1), the update mostly works. Issues encountered are as follows:

avggeek commented 2 years ago

I commented on the older issue (#1579) but I realized there's a new issue so reposting my comment here instead. The error I get is slightly different after upgrading to 3.0.1.

I tried to download (this novel) using the following command:

lncrawl --login Bearer ey.. -s https://www.wuxiaworld.com/novel/overgeared --format epub --filename "Overgeared - rainbowturtle" --filename-only --output . --single

The download errors out with the following message:

Failed to get chapter: Message: no such element: Unable to locate element: {"method":"css selector","selector":".chapter-content"}
  (Session info: headless chrome=106.0.5249.103)
Stacktrace:
#0 0x56271decf2c3 <unknown>
#1 0x56271dcd883a <unknown>
#2 0x56271dd11985 <unknown>
#3 0x56271dd11b61 <unknown>
#4 0x56271dd49d14 <unknown>
#5 0x56271dd2ff6d <unknown>
#6 0x56271dd47a50 <unknown>
#7 0x56271dd2fd63 <unknown>
#8 0x56271dd047e3 <unknown>
#9 0x56271dd05a21 <unknown>
#10 0x56271df1d18e <unknown>
#11 0x56271df20622 <unknown>
#12 0x56271df03aae <unknown>
#13 0x56271df212a3 <unknown>
#14 0x56271def7ecf <unknown>
#15 0x56271df41588 <unknown>
#16 0x56271df41706 <unknown>
#17 0x56271df5b8b2 <unknown>
#18 0x7f9da4bc5ea7 <unknown>

Chapters:   0%|                           | 2/1705 [01:38<19:20:13, 40.88s/item]Failed to get chapter: Message: no such element: Unable to locate element: {"method":"css selector","selector":".chapter-content"}
  (Session info: headless chrome=106.0.5249.103)
Stacktrace:
#0 0x56271decf2c3 <unknown>
#1 0x56271dcd883a <unknown>
#2 0x56271dd11985 <unknown>
#3 0x56271dd11b61 <unknown>
#4 0x56271dd49d14 <unknown>
#5 0x56271dd2ff6d <unknown>
#6 0x56271dd47a50 <unknown>
#7 0x56271dd2fd63 <unknown>
#8 0x56271dd047e3 <unknown>
#9 0x56271dd05a21 <unknown>
#10 0x56271df1d18e <unknown>
#11 0x56271df20622 <unknown>
#12 0x56271df03aae <unknown>
#13 0x56271df212a3 <unknown>
#14 0x56271def7ecf <unknown>
#15 0x56271df41588 <unknown>
#16 0x56271df41706 <unknown>
#17 0x56271df5b8b2 <unknown>
#18 0x7f9da4bc5ea7 <unknown>

Chapters:   0%|                           | 3/1705 [03:09<30:13:48, 63.94s/item]Chapters:   0%|                           | 3/1705 [04:21<41:10:25, 87.09s/item]
Traceback (most recent call last):
  File "/home/avggeek/.local/bin/lncrawl", line 8, in <module>
    sys.exit(main())
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/__init__.py", line 14, in main
    start_app()
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/core/__init__.py", line 68, in start_app
    run_bot(bot)
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/bots/__init__.py", line 16, in run_bot
    ConsoleBot().start()
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/bots/console/integration.py", line 92, in start
    self.app.start_download()
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/core/app.py", line 155, in start_download
    fetch_chapter_body(self)
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/core/downloader.py", line 88, in fetch_chapter_body
    for progress in app.crawler.download_chapters(app.chapters):
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/templates/browser/basic.py", line 111, in download_chapters
    chapter.body = self.download_chapter_body_in_browser(chapter)
  File "/home/avggeek/.lncrawl/sources/en/w/wuxiacom.py", line 221, in download_chapter_body_in_browser
    content = self.browser.find("chapter-content", By.CLASS_NAME).as_tag()
  File "/home/avggeek/.local/lib/python3.9/site-packages/lncrawl/core/browser.py", line 162, in find
    return self._driver.find_element(by, selector)
  File "/home/avggeek/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 856, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/avggeek/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 427, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/home/avggeek/.local/lib/python3.9/site-packages/selenium/webdriver/remote/remote_connection.py", line 344, in execute
    return self._request(command_info[0], url, body=data)
  File "/home/avggeek/.local/lib/python3.9/site-packages/selenium/webdriver/remote/remote_connection.py", line 366, in _request
    response = self._conn.request(method, url, body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/usr/lib/python3/dist-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/poolmanager.py", line 375, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
idMysteries commented 2 years ago

@dipu-bd google colab doesn't support chrome?

dipu-bd commented 2 years ago

No, google colab does not have webdriver support. I do not have much experience in google colab. So, can't tell if it is possible to give it webdriver support or not.

But passing a public selenium grid URL using --selenium-grid should work.

Closing this issue, since it is not planned to add webdriver support in google colab.