Integration-Automation / ReEdgeGPT

Microsoft's Bing Chat AI
MIT License
163 stars 27 forks source link

Support crawlbase #129

Closed FuseFairy closed 7 months ago

FuseFairy commented 7 months ago

If support rotating proxies, can avoid requests being blocked.

Link:https://crawlbase.com/docs/smart-proxy/

JE-Chen commented 7 months ago

Can’t even run their example.

import requests

proxy_url = "http://token:@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy_url, "https": proxy_url}

response = requests.get(url="http://httpbin.org/ip", proxies=proxies, verify=False)
Traceback (most recent call last):
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connection.py", line 198, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    raise err
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connectionpool.py", line 496, in _make_request
    conn.request(
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connection.py", line 400, in request
    self.endheaders()
  File "C:\Users\JeffreyChen\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\JeffreyChen\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1038, in _send_output
    self.send(msg)
  File "C:\Users\JeffreyChen\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 976, in send
    self.connect()
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connection.py", line 238, in connect
    self.sock = self._new_conn()
                ^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connection.py", line 207, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x0000019A7265CF10>, 'Connection to smartproxy.crawlbase.com timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000019A7265CF10>, 'Connection to smartproxy.crawlbase.com timed out. (connect timeout=None)'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\connectionpool.py", line 847, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='smartproxy.crawlbase.com', port=8012): Max retries exceeded with url: http://httpbin.org/ip (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000019A7265CF10>, 'Connection to smartproxy.crawlbase.com timed out. (connect timeout=None)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\test\unit_test\back-end\manual_test\test_bot_manual_proxy.py", line 6, in <module>
    response = requests.get(url="http://httpbin.org/ip", proxies=proxies, verify=False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JeffreyChen\Desktop\Code_Space\ReEdgeGPT\venv\Lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPConnectionPool(host='smartproxy.crawlbase.com', port=8012): Max retries exceeded with url: http://httpbin.org/ip (Caused by ProxyError('Unable to connect to proxy', ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000019A7265CF10>, 'Connection to smartproxy.crawlbase.com timed out. (connect timeout=None)')))
FuseFairy commented 7 months ago

strange, because I can use it. image

JE-Chen commented 7 months ago

Well, maybe my company’s firewall is blocking the connection.

JE-Chen commented 7 months ago

I can confirm that my company’s firewall is blocking the connection. When I enable the VPN, I can access the service.

JE-Chen commented 7 months ago

I think they don't provide wss proxy?

Cannot connect to host sydney.bing.com:443 ssl:<ssl.SSLContext object at 0x000001ABD55E56D0> [None]
FuseFairy commented 7 months ago

Well, it looks like there's really no support wss

FuseFairy commented 7 months ago

or for generating images? Because it seems more likely to fail, not sure

JE-Chen commented 7 months ago

Can't param proxy work?

FuseFairy commented 7 months ago

It will work fine, it's just a matter of whether you want to add other options or not.

JE-Chen commented 7 months ago

What kind of option? What should that option do?

FuseFairy commented 7 months ago

I mean, image generation can optionally use a rotation proxies

JE-Chen commented 7 months ago

They do, or you want to set up a proxy for every image creation call?

class ImageGenAsync:
    """
    Image generation by Microsoft Bing
    Parameters:
        auth_cookie: str
    Optional Parameters:
        debug_file: str
        quiet: bool
        all_cookies: list[dict]
    """

    def __init__(
            self,
            auth_cookie: str = None,
            debug_file: Union[str, None] = None,
            quiet: bool = False,
            all_cookies: List[Dict] = None,
            proxy: str = None
    ) -> None:
        if auth_cookie is None and not all_cookies:
            raise AuthCookieError("No auth cookie provided")
        self.proxy: str = get_proxy(proxy)
        self.session = httpx.AsyncClient(
            proxies=self.proxy,
            headers=HEADERS,
            trust_env=True,
        )
JE-Chen commented 7 months ago

Will check this tonight.

FuseFairy commented 7 months ago

Maybe don't need to try it, I set varify to False and it works fine, but I get an error, so I think there's something wrong with crawlbase. haha

  def __init__(
          self,
          auth_cookie: str = None,
          debug_file: Union[str, None] = None,
          quiet: bool = False,
          all_cookies: List[Dict] = None,
          proxy: str = None
  ) -> None:
      if auth_cookie is None and not all_cookies:
          raise AuthCookieError("No auth cookie provided")
      self.proxy: str = get_proxy(proxy)
      self.session = httpx.AsyncClient(
          verify=False,
          proxies=self.proxy,
          headers=HEADERS,
          trust_env=True,
      )

image

JE-Chen commented 7 months ago

Maybe don't need to try it, I set varify to False and it works fine, but I get an error, so I think there's something wrong with crawlbase. haha

  def __init__(
          self,
          auth_cookie: str = None,
          debug_file: Union[str, None] = None,
          quiet: bool = False,
          all_cookies: List[Dict] = None,
          proxy: str = None
  ) -> None:
      if auth_cookie is None and not all_cookies:
          raise AuthCookieError("No auth cookie provided")
      self.proxy: str = get_proxy(proxy)
      self.session = httpx.AsyncClient(
          verify=False,
          proxies=self.proxy,
          headers=HEADERS,
          trust_env=True,
      )

image

Can confirm this proxy will raise SSL verify failed and will failed when use verify=false

SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')
FuseFairy commented 7 months ago

Then there's nothing we can do about it . I'm closing this issue