cbaziotis / ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
MIT License
660 stars 91 forks source link

Getting URLError: <urlopen error [Errno 60] Operation timed out> #13

Open BlaBlaPer opened 5 years ago

BlaBlaPer commented 5 years ago

Hi Christos, I got this error when I ran the code TextPreProcessor. Do you know how to fix it? `Word statistics files not found! Downloading...

TimeoutError Traceback (most recent call last) ~/anaconda3/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args) 1317 h.request(req.get_method(), req.selector, req.data, headers, -> 1318 encode_chunked=req.has_header('Transfer-encoding')) 1319 except OSError as err: # timeout error

~/anaconda3/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked) 1238 """Send a complete request to the server.""" -> 1239 self._send_request(method, url, body, headers, encode_chunked) 1240

~/anaconda3/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked) 1284 body = _encode(body, 'body') -> 1285 self.endheaders(body, encode_chunked=encode_chunked) 1286

~/anaconda3/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked) 1233 raise CannotSendHeader() -> 1234 self._send_output(message_body, encode_chunked=encode_chunked) 1235

~/anaconda3/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked) 1025 del self._buffer[:] -> 1026 self.send(msg) 1027

~/anaconda3/lib/python3.6/http/client.py in send(self, data) 963 if self.auto_open: --> 964 self.connect() 965 else:

~/anaconda3/lib/python3.6/http/client.py in connect(self) 1391 -> 1392 super().connect() 1393

~/anaconda3/lib/python3.6/http/client.py in connect(self) 935 self.sock = self._create_connection( --> 936 (self.host,self.port), self.timeout, self.source_address) 937 self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

~/anaconda3/lib/python3.6/socket.py in create_connection(address, timeout, source_address) 721 if err is not None: --> 722 raise err 723 else:

~/anaconda3/lib/python3.6/socket.py in create_connection(address, timeout, source_address) 712 sock.bind(source_address) --> 713 sock.connect(sa) 714 return sock

TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)

in 30 # list of dictionaries, for replacing tokens extracted from the text, 31 # with other expressions. You can pass more than one dictionaries. ---> 32 dicts=[emoticons] 33 ) 34 ~/anaconda3/lib/python3.6/site-packages/ekphrasis/classes/preprocessor.py in __init__(self, **kwargs) 90 91 if self.unpack_hashtags: ---> 92 self.segmenter = Segmenter(corpus=self.segmenter_corpus) 93 if self.mode != "fast": 94 self.spell_corrector = SpellCorrector(corpus=self.corrector_corpus) ~/anaconda3/lib/python3.6/site-packages/ekphrasis/classes/segmenter.py in __init__(self, corpus, max_split_length) 57 # self.unigrams = Counter(read_stats(corpus, 1)) 58 # self.bigrams = Counter(read_stats(corpus, 2)) ---> 59 self.unigrams = read_stats(corpus, 1) 60 self.bigrams = read_stats(corpus, 2) 61 self.N = sum(self.unigrams.values()) ~/anaconda3/lib/python3.6/site-packages/ekphrasis/utils/helpers.py in read_stats(corpus, ngram) 45 def read_stats(corpus, ngram): 46 stats_dir = get_stats_dir() ---> 47 check_stats_files() 48 print("Reading " + "{} - {}grams ...".format(corpus, ngram)) 49 text = path.join(*[stats_dir, corpus, "counts_{}grams.txt".format(ngram)]) ~/anaconda3/lib/python3.6/site-packages/ekphrasis/utils/helpers.py in check_stats_files() 88 stats_dir = get_stats_dir() 89 if not os.path.exists(stats_dir) or len(listdir_nohidden(stats_dir)) == 0: ---> 90 download_statistics() 91 92 ~/anaconda3/lib/python3.6/site-packages/ekphrasis/utils/helpers.py in download_statistics() 74 print("Word statistics files not found!\nDownloading...", end=" ") 75 url = "https://www.dropbox.com/s/a84otqrg6u1c5je/stats.zip?dl=1" ---> 76 urlretrieve(url, "stats.zip") 77 print("done!") 78 ~/anaconda3/lib/python3.6/urllib/request.py in urlretrieve(url, filename, reporthook, data) 246 url_type, path = splittype(url) 247 --> 248 with contextlib.closing(urlopen(url, data)) as fp: 249 headers = fp.info() 250 ~/anaconda3/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 221 else: 222 opener = _opener --> 223 return opener.open(url, data, timeout) 224 225 def install_opener(opener): ~/anaconda3/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout) 524 req = meth(req) 525 --> 526 response = self._open(req, data) 527 528 # post-process response ~/anaconda3/lib/python3.6/urllib/request.py in _open(self, req, data) 542 protocol = req.type 543 result = self._call_chain(self.handle_open, protocol, protocol + --> 544 '_open', req) 545 if result: 546 return result ~/anaconda3/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args) 502 for handler in handlers: 503 func = getattr(handler, meth_name) --> 504 result = func(*args) 505 if result is not None: 506 return result ~/anaconda3/lib/python3.6/urllib/request.py in https_open(self, req) 1359 def https_open(self, req): 1360 return self.do_open(http.client.HTTPSConnection, req, -> 1361 context=self._context, check_hostname=self._check_hostname) 1362 1363 https_request = AbstractHTTPHandler.do_request_ ~/anaconda3/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args) 1318 encode_chunked=req.has_header('Transfer-encoding')) 1319 except OSError as err: # timeout error -> 1320 raise URLError(err) 1321 r = h.getresponse() 1322 except: URLError: `
Vonisoa commented 5 years ago

I got the same error. Anyone can help?

cbaziotis commented 5 years ago

This is related to https://github.com/cbaziotis/ekphrasis/issues/11#issuecomment-506710607. Use the proposed workaround until I resolve the issue.