MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.08k stars 757 forks source link

ConnectionError: HTTPSConnectionPool #906

Closed pioneer-summit closed 1 year ago

pioneer-summit commented 1 year ago

When i run this code

model = BERTopic(language="chinese (simplified)")
topics, probs = model.fit_transform(docs)

I have this Error:

---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
G:\Anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
    168         try:
--> 169             conn = connection.create_connection(
    170                 (self._dns_host, self.port), self.timeout, **extra_kw

G:\Anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
     95     if err is not None:
---> 96         raise err
     97 

G:\Anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
     85                 sock.bind(source_address)
---> 86             sock.connect(sa)
     87             return sock

TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
G:\Anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    698             # Make the request on the httplib connection object.
--> 699             httplib_response = self._make_request(
    700                 conn,

G:\Anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    381         try:
--> 382             self._validate_conn(conn)
    383         except (SocketTimeout, BaseSSLError) as e:

G:\Anaconda3\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn)
   1009         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1010             conn.connect()
   1011 

G:\Anaconda3\lib\site-packages\urllib3\connection.py in connect(self)
    352         # Add certificate verification
--> 353         conn = self._new_conn()
    354         hostname = self.host

G:\Anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
    180         except SocketError as e:
--> 181             raise NewConnectionError(
    182                 self, "Failed to establish a new connection: %s" % e

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x0000024BF1722CA0>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
G:\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    438             if not chunked:
--> 439                 resp = conn.urlopen(
    440                     method=request.method,

G:\Anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    754 
--> 755             retries = retries.increment(
    756                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

G:\Anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    573         if new_retry.is_exhausted():
--> 574             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    575 

MaxRetryError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/16cc9e54df6e083272378abec2d75dc34d7a48b5276db3ccc050d18de672ac59?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T070023Z&X-Amz-Expires=259200&X-Amz-Signature=171d6342714b8469d06bb1d15cd60f8eeed49f8f0adcd9a1467f2a3388ff7283&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000024BF1722CA0>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-7-6c851d021e7a> in <module>
----> 1 topics, probs = model.fit_transform(docs)

G:\Anaconda3\lib\site-packages\bertopic\_bertopic.py in fit_transform(self, documents, embeddings, y)
    329         # Extract embeddings
    330         if embeddings is None:
--> 331             self.embedding_model = select_backend(self.embedding_model,
    332                                                   language=self.language)
    333             embeddings = self._extract_embeddings(documents.Document,

G:\Anaconda3\lib\site-packages\bertopic\backend\_utils.py in select_backend(embedding_model, language)
     71             return SentenceTransformerBackend("all-MiniLM-L6-v2")
     72         elif language.lower() in languages or language == "multilingual":
---> 73             return SentenceTransformerBackend("paraphrase-multilingual-MiniLM-L12-v2")
     74         else:
     75             raise ValueError(f"{language} is currently not supported. However, you can "

G:\Anaconda3\lib\site-packages\bertopic\backend\_sentencetransformers.py in __init__(self, embedding_model)
     41             self.embedding_model = embedding_model
     42         elif isinstance(embedding_model, str):
---> 43             self.embedding_model = SentenceTransformer(embedding_model)
     44         else:
     45             raise ValueError("Please select a correct SentenceTransformers model: \n"

G:\Anaconda3\lib\site-packages\sentence_transformers\SentenceTransformer.py in __init__(self, model_name_or_path, modules, device, cache_folder, use_auth_token)
     85                 if not os.path.exists(os.path.join(model_path, 'modules.json')):
     86                     # Download from hub with caching
---> 87                     snapshot_download(model_name_or_path,
     88                                         cache_dir=cache_folder,
     89                                         library_name='sentence-transformers',

G:\Anaconda3\lib\site-packages\sentence_transformers\util.py in snapshot_download(repo_id, revision, cache_dir, library_name, library_version, user_agent, ignore_files, use_auth_token)
    489             cached_download_args['legacy_cache_layout'] = True
    490 
--> 491         path = cached_download(**cached_download_args)
    492 
    493         if os.path.exists(path + ".lock"):

G:\Anaconda3\lib\site-packages\huggingface_hub\utils\_validators.py in _inner_fn(*args, **kwargs)
    122             )
    123 
--> 124         return fn(*args, **kwargs)
    125 
    126     return _inner_fn  # type: ignore

G:\Anaconda3\lib\site-packages\huggingface_hub\file_download.py in cached_download(url, library_name, library_version, cache_dir, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout)
    741             logger.info("downloading %s to %s", url, temp_file.name)
    742 
--> 743             http_get(
    744                 url_to_download,
    745                 temp_file,

G:\Anaconda3\lib\site-packages\huggingface_hub\file_download.py in http_get(url, temp_file, proxies, resume_size, headers, timeout, max_retries)
    473     if resume_size > 0:
    474         headers["Range"] = "bytes=%d-" % (resume_size,)
--> 475     r = _request_wrapper(
    476         method="GET",
    477         url=url,

G:\Anaconda3\lib\site-packages\huggingface_hub\file_download.py in _request_wrapper(method, url, max_retries, base_wait_time, max_wait_time, timeout, follow_relative_redirects, **params)
    436 
    437     # 3. Exponential backoff
--> 438     return http_backoff(
    439         method=method,
    440         url=url,

G:\Anaconda3\lib\site-packages\huggingface_hub\utils\_http.py in http_backoff(method, url, max_retries, base_wait_time, max_wait_time, retry_on_exceptions, retry_on_status_codes, **kwargs)
    127 
    128             # Perform request and return if status_code is not in the retry list.
--> 129             response = requests.request(method=method, url=url, **kwargs)
    130             if response.status_code not in retry_on_status_codes:
    131                 return response

G:\Anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs)
     59     # cases, and look like a memory leak in others.
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62 
     63 

G:\Anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    540         }
    541         send_kwargs.update(settings)
--> 542         resp = self.send(prep, **send_kwargs)
    543 
    544         return resp

G:\Anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    653 
    654         # Send the request
--> 655         r = adapter.send(request, **kwargs)
    656 
    657         # Total elapsed time of the request (approximately)

G:\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/16cc9e54df6e083272378abec2d75dc34d7a48b5276db3ccc050d18de672ac59?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T070023Z&X-Amz-Expires=259200&X-Amz-Signature=171d6342714b8469d06bb1d15cd60f8eeed49f8f0adcd9a1467f2a3388ff7283&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000024BF1722CA0>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。'))

What can i do for this?

MaartenGr commented 1 year ago

Based on your error, it seems that you cannot make a connection to the internet from the environment you are currently in. There can be many reasons for this, including ports that are blocked because of that environment. Allowing external connections to be made should fix this.

MaartenGr commented 1 year ago

Due to inactivity, I'll be closing this issue. Let me know if you want me to re-open the issue!