algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
305 stars 106 forks source link

An error was encountered while executing `./docsearch bootstrap`. #547

Closed Yue-plus closed 3 years ago

Yue-plus commented 3 years ago

An error was encountered while executing ./docsearch bootstrap. 执行 ./docsearch bootstrap 时遇到一个错误。


When referring to the Run your own> Create a new configuration document, an error occurred when trying to generate the configuration file: 在参考 Run your own > Create a new configuration 文档,尝试生成配置文件时发生错误:

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ ./docsearch bootstrap
C:\Users\Yue_plus\code\docsearch-scraper\cli\src\commands\run_tests.py:22: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if args[1] is "no_browser":
start url: https://note.yueplus.ink/
Traceback (most recent call last):
  File "C:\Users\Yue_plus\code\docsearch-scraper\docsearch", line 5, in <module>
    run()
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\src\index.py", line 161, in run
    exit(command.run(sys.argv[2:]))
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\src\commands\bootstrap_config.py", line 19, in run
    config = create_config()
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\..\deployer\src\config_creator.py", line 412, in create_config
    u).subdomain if tldextract.extract(
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 358, in extract
    return TLD_EXTRACTOR(url)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 238, in __call__
    suffix_index = self._get_tld_extractor().suffix_index(translations)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 278, in _get_tld_extractor
    raw_suffix_list_data = find_first_response(self.suffix_list_urls)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\remote.py", line 37, in find_first_response
    text = session.get(url).text
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\adapters.py", line 439, in send
    resp = conn.urlopen(
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connection.py", line 359, in connect
    conn = self._connect_tls_proxy(hostname, conn)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connection.py", line 500, in _connect_tls_proxy
    return ssl_wrap_socket(
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\util\ssl_.py", line 432, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\util\ssl_.py", line 474, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "c:\users\yue_plus\appdata\local\programs\python\python39\lib\ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "c:\users\yue_plus\appdata\local\programs\python\python39\lib\ssl.py", line 997, in _create
    raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname
Yue-plus commented 3 years ago

system environments: 系统环境:

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ pip --version
pip 21.0.1 from C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\pip (python 3.9)

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ pipenv --version
pipenv, version 2020.11.15

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ python --version
Python 3.9.2
shortcuts commented 3 years ago

Hi @Yue-plus,

Our scraper requires the Python version 3.6.2, could you please try again with this one and let me know if the error still occurs?

Also, when you start your pipenv shell, it should warn your if you have the wrong version.

Yue-plus commented 3 years ago

Tried again with Python 3.6.2, but this time the error is a little different: 用 Python 3.6.2 又试了一次,但这次错误有些不一样:

image

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ pipenv shell
Launching subshell in virtual environment...

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ ./docsearch bootstrap

No .env found. Let's create one.
What is your Algolia APPLICATION_ID: 0YWM7BGDQI
What is your Algolia API_KEY: **************************

start url: https://note.yueplus.ink/
Traceback (most recent call last):
  File "./docsearch", line 5, in <module>
    run()
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\src\index.py", line 161, in run
    exit(command.run(sys.argv[2:]))
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\src\commands\bootstrap_config.py", line 19, in run
    config = create_config()
  File "C:\Users\Yue_plus\code\docsearch-scraper\cli\..\deployer\src\config_creator.py", line 413, in create_config
    u).domain == 'github' else tldextract.extract(u).domain
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 358, in extract
    return TLD_EXTRACTOR(url)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 238, in __call__
    suffix_index = self._get_tld_extractor().suffix_index(translations)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\tldextract.py", line 278, in _get_tld_extractor
    raw_suffix_list_data = find_first_response(self.suffix_list_urls)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\tldextract\remote.py", line 37, in find_first_response
    text = session.get(url).text
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connection.py", line 359, in connect
    conn = self._connect_tls_proxy(hostname, conn)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\connection.py", line 506, in _connect_tls_proxy
    ssl_context=ssl_context,
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\util\ssl_.py", line 432, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "C:\Users\Yue_plus\.virtualenvs\docsearch-scraper-Vgo08AAo\lib\site-packages\urllib3\util\ssl_.py", line 474, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "C:\Users\Yue_plus\AppData\Local\Programs\Python\Python36\lib\ssl.py", line 401, in wrap_socket
    _context=self, _session=session)
  File "C:\Users\Yue_plus\AppData\Local\Programs\Python\Python36\lib\ssl.py", line 764, in __init__
    raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname

Yue_plus@DESKTOP-IAMSSP8 MINGW64 ~/code/docsearch-scraper (master)
$ python --version
Python 3.6.2
shortcuts commented 3 years ago

I'm not able to reproduce, it could be related to your OS.

Here's a recent similar issue from a windows user, could you please try this solution and let me know if the issue is still there?

https://github.com/algolia/docsearch-scraper/blob/3b17693ef6d2aa2e6dfffbf094af69530139e109/Pipfile.lock#L448-L455

Thanks

Yue-plus commented 3 years ago

I'm not able to reproduce, it could be related to your OS.

Here's a recent similar issue from a windows user, could you please try this solution and let me know if the issue is still there?

https://github.com/algolia/docsearch-scraper/blob/3b17693ef6d2aa2e6dfffbf094af69530139e109/Pipfile.lock#L448-L455

Thanks

After canceling the GitBash proxy settings, the problem was solved: 在取消 GitBash 代理设置后,问题解决了:

git config --global --unset http.proxy
git config --global --unset https.proxy

Thank you very much! 非常感谢!

shortcuts commented 3 years ago

Glad to hear that @Yue-plus, thanks for the info!