ViciousPotato / safaribooks

Convert safaribooksonline ebook to epub and Kindle mobi format
350 stars 78 forks source link

CentOS 7 Spider not found: SafariBooks #36

Open anutator opened 6 years ago

anutator commented 6 years ago
2018-02-24 05:19:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-02-24 05:19:34 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.5 (default, Aug  4 2017, 00:39:18) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g  2 Nov 2017), cryptography 2.1.4, Platform Linux-3.10.0-693.11.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Traceback (most recent call last):
  File "/usr/bin/safaribooks", line 9, in <module>
    load_entry_point('safaribooks==0.1.0', 'console_scripts', 'safaribooks')()
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 121, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 28, in download_epub
    output_directory=args.output_directory
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 202, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/spiderloader.py", line 71, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: SafariBooks'

I installed Python 2.7.14 as an alternative (not deleting 2.7.5). https://tecadmin.net/install-python-2-7-on-centos-rhel/ Should I add something to configuration?

anutator commented 6 years ago

The answer is to run command inside safarybooks directory. But epub files are very small. I don't show book id number.

2018-02-24 05:37:38 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/book-id-number/chapter/ch01s04.html>: HTTP status code is not handled or not allowed
2018-02-24 05:37:39 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/book_id/chapter/ch01s03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=book_id_number)
yuankunzhang commented 6 years ago

I encountered this problem too, thank you @bestann for your solution. And this behavior is really confusing and should be improved.