Open fanjie17 opened 2 years ago
My guess here is that the downloaded model is somehow corrupt. Could you try finding the directory where your data is located using the following:
from chemdataextractor.data import get_data_dir
print(get_data_dir())
And delete the models folder. On the next run, ChemDataExtractor should download all required model files from scratch, which should hopefully alleviate this issue. If this doesn't work, could you let me know what OS you're on and what Python version you're running?
It worked now. Thank you!
Hi,
I delete the models document, the models document is not download from scratch, how can I find this folder? the error is as shown follow: ValueError: unable to parse C:\Users\99239\AppData\Local\ChemDataExtractor\ChemDataExtractor\models/bert_finetuned_crf_model-1.0a as a URL or as a local path
@christina0106
If you are working in a Jupyter Notebook - have you restarted your Python Kernel and imported ChemDataExtractor again after deleting the models
directory? If you're working in a Python shell, maybe restart that and try again
Hi,
Thanks for answers, I delete the models document and the models were reload autamatic while the AllenNLP model still downdoad failed My python version is python3.7, and used Windows_x86 system with anaconda3 environment jupyter.
If we should download the offline package? and where to download?
The input and error as follows:
Input: from chemdataextractor import Document doc = Document('UV-vis spectrum of 5,10,15,20-Tetra(4-carboxyphenyl)porphyrin in Tetrahydrofuran (THF).') doc.cems
OSError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\allennlp\common\util.py in get_spacy_model(spacy_model_name, pos_tags, parse, ner) 288 try: --> 289 spacy_model = spacy.load(spacy_model_name, disable=disable) 290 except OSError:
~.conda\envs\python37\lib\site-packages\spacy__init__.py in load(name, overrides) 26 deprecation_warning(Warnings.W001.format(path=depr_path)) ---> 27 return util.load_model(name, overrides) 28
~.conda\envs\python37\lib\site-packages\spacy\util.py in load_model(name, overrides) 138 return load_model_from_path(name, overrides) --> 139 raise IOError(Errors.E050.format(name=name)) 140
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
During handling of the above exception, another exception occurred:
gaierror Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\urllib3\connection.py in _new_conn(self) 158 conn = connection.create_connection( --> 159 (self._dns_host, self.port), self.timeout, **extra_kw) 160
~.conda\envs\python37\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options) 56 ---> 57 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): 58 af, socktype, proto, canonname, sa = res
~.conda\envs\python37\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags) 747 addrlist = [] --> 748 for res in _socket.getaddrinfo(host, port, family, type, proto, flags): 749 af, socktype, proto, canonname, sa = res
gaierror: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 599 body=body, headers=headers, --> 600 chunked=chunked) 601
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 342 try: --> 343 self._validate_conn(conn) 344 except (SocketTimeout, BaseSSLError) as e:
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn)
838 if not getattr(conn, 'sock', None): # AppEngine might not have .sock
--> 839 conn.connect()
840
~.conda\envs\python37\lib\site-packages\urllib3\connection.py in connect(self) 300 # Add certificate verification --> 301 conn = self._new_conn() 302 hostname = self.host
~.conda\envs\python37\lib\site-packages\urllib3\connection.py in _new_conn(self) 167 raise NewConnectionError( --> 168 self, "Failed to establish a new connection: %s" % e) 169
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 496 timeout=timeout, --> 497 chunked=chunked, 498 )
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 637 retries = retries.increment(method, url, error=e, _pool=self, --> 638 _stacktrace=sys.exc_info()[2]) 639 retries.sleep()
~.conda\envs\python37\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 398 if new_retry.is_exhausted(): --> 399 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 400
MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_20180\270517424.py in
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in cems(self)
564 A list of all Chemical Entity Mentions in this document as :class:~chemdataextractor.doc.text.Span
565 """
--> 566 return list(set([n for el in self.elements for n in el.cems]))
567
568 @property
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in ~chemdataextractor.doc.text.Span
565 """
--> 566 return list(set([n for el in self.elements for n in el.cems]))
567
568 @property
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in cems(self)
344 A list of all Chemical Entity Mentions in this text as :class:chemdataextractor.doc.text.span
345 """
--> 346 return [cem for sent in self.sentences for cem in sent.cems]
347
348 @property
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in chemdataextractor.doc.text.span
345 """
--> 346 return [cem for sent in self.sentences for cem in sent.cems]
347
348 @property
~.conda\envs\python37\lib\site-packages\chemdataextractor\utils.py in fget_memoized(self) 27 def fget_memoized(self): 28 if not hasattr(self, attr_name): ---> 29 setattr(self, attr_name, fget(self)) 30 return getattr(self, attr_name) 31 return property(fget_memoized)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in cems(self) 642 spans = [] 643 # print(self.text.encode('utf8')) --> 644 for result in chemical_name.scan(self.tokens): 645 # parser scan yields (result, startindex, endindex) - we just use the indexes here 646 tokens = self.tokens[result[1]:result[2]]
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in scan(self, tokens, max_matches, overlap) 115 while i < length and matches < max_matches: 116 try: --> 117 results, next_i = self.parse(tokens, i) 118 except ParseException as err: 119 # print(err.msg)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 425 results = [] 426 for e in self.exprs: --> 427 exprresults, i = e.parse(tokens, i) 428 if exprresults is not None: 429 results.extend(exprresults)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 682 results = [] 683 try: --> 684 results, i = self.expr.parse(tokens, i, actions) 685 except (ParseException, IndexError): 686 pass
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 425 results = [] 426 for e in self.exprs: --> 427 exprresults, i = e.parse(tokens, i) 428 if exprresults is not None: 429 results.extend(exprresults)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 628 def _parse_tokens(self, tokens, i, actions=True): 629 try: --> 630 self.expr.try_parse(tokens, i) 631 except (ParseException, IndexError): 632 pass
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in try_parse(self, tokens, i) 158 159 def try_parse(self, tokens, i): --> 160 return self.parse(tokens, i, actions=False)[1] 161 162 def _parse_tokens(self, tokens, i, actions=True):
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 295 def _parse_tokens(self, tokens, i, actions=True): 296 token = tokens[i] --> 297 tag = token[self.tag_type] 298 if tag == self.match: 299 return [E(self.name or safe_name(tag), token[0])], i + 1
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getitem(self, key) 1073 return self.text 1074 elif key == 1: -> 1075 return self.legacy_pos_tag 1076 elif isinstance(key, str): 1077 return self.getattr(key)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in legacy_pos_tag(self) 1063 def legacy_pos_tag(self): 1064 pos_tag = self[POS_TAG_TYPE] -> 1065 ner_tag = self[NER_TAG_TYPE] 1066 if ner_tag is not None and ner_tag != "O": 1067 return ner_tag
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getitem(self, key) 1075 return self.legacy_pos_tag 1076 elif isinstance(key, str): -> 1077 return self.getattr(key) 1078 else: 1079 raise IndexError("Key" + str(key) + " is out of bounds for this token.")
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getattr(self, name) 1083 return self._tags[name] 1084 else: -> 1085 self.sentence._assign_tags(name) 1086 if name not in self._tags.keys(): 1087 raise AttributeError(name + " is not a supported tag type for the sentence: " + str(self.sentence) + str(self.sentence.taggers) + str(type(self.sentence))
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in _assign_tags(self, tag_type) 788 tags = None 789 if hasattr(tagger, "batch_tag_for_type") and tagger.can_batch_tag(tag_type) and self.document is not None: --> 790 self.document._batch_assign_tags(tagger, tag_type) 791 elif hasattr(tagger, "tag_for_type"): 792 tags = tagger.tag_for_type(self.tokens, tag_type)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in _batch_assign_tags(self, tagger, tag_type) 621 622 if hasattr(tagger, "batch_tag_for_type"): --> 623 tag_results = tagger.batch_tag_for_type(all_tokens, tag_type) 624 else: 625 tag_results = tagger.batch_tag(all_tokens)
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\tag.py in batch_tag_for_type(self, sents, tag_type) 204 """ 205 tagger = self.taggers_dict[tag_type] --> 206 return tagger.batch_tag(sents) 207 208 def can_batch_tag(self, tag_type):
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\allennlpwrapper.py in batch_tag(self, sents) 193 log.debug("".join(["Batch size:", str(len(instance))])) 194 with torch.no_grad(): --> 195 batch_predictions = self.predictor.predict_batch_instance(instance) 196 predictions.extend(batch_predictions) 197 prediction_end_time = datetime.datetime.now()
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\allennlpwrapper.py in predictor(self) 157 model = model.cuda(gpu_id) 158 model = model.eval() --> 159 self._predictor = copy.deepcopy(SentenceTaggerPredictor(model=model, dataset_reader=None)) 160 sp.ok("✔") 161 return self._predictor
~.conda\envs\python37\lib\site-packages\allennlp\predictors\sentence_tagger.py in init(self, model, dataset_reader, language) 24 def init(self, model: Model, dataset_reader: DatasetReader, language: str = 'en_core_web_sm') -> None: 25 super().init(model, dataset_reader) ---> 26 self._tokenizer = SpacyWordSplitter(language=language, pos_tags=True) 27 28 def predict(self, sentence: str) -> JsonDict:
~.conda\envs\python37\lib\site-packages\allennlp\data\tokenizers\word_splitter.py in init(self, language, pos_tags, parse, ner, keep_spacy_tokens, split_on_spaces) 171 keep_spacy_tokens: bool = False, 172 split_on_spaces: bool = False) -> None: --> 173 self.spacy = get_spacy_model(language, pos_tags, parse, ner) 174 if split_on_spaces: 175 self.spacy.tokenizer = WhitespaceTokenizer(self.spacy.vocab)
~.conda\envs\python37\lib\site-packages\allennlp\common\util.py in get_spacy_model(spacy_model_name, pos_tags, parse, ner) 290 except OSError: 291 logger.warning(f"Spacy models '{spacy_model_name}' not found. Downloading and installing.") --> 292 spacy_download(spacy_model_name) 293 # NOTE(mattg): The following four lines are a workaround suggested by Ines for spacy 294 # 2.1.0, which removed the linking that was done in spacy 2.0. importlib doesn't find
~.conda\envs\python37\lib\site-packages\spacy\cli\download.py in download(model, direct, *pip_args) 36 dl = download_model(dl_tpl.format(m=model_name, v=version), pip_args) 37 else: ---> 38 shortcuts = get_json(about.shortcuts, "available shortcuts") 39 model_name = shortcuts.get(model, model) 40 compatibility = get_compatibility()
~.conda\envs\python37\lib\site-packages\spacy\cli\download.py in get_json(url, desc) 82 83 def get_json(url, desc): ---> 84 r = requests.get(url) 85 if r.status_code != 200: 86 msg.fail(
~.conda\envs\python37\lib\site-packages\requests\api.py in get(url, params, kwargs) 71 """ 72 ---> 73 return request("get", url, params=params, kwargs) 74 75
~.conda\envs\python37\lib\site-packages\requests\api.py in request(method, url, kwargs) 57 # cases, and look like a memory leak in others. 58 with sessions.Session() as session: ---> 59 return session.request(method=method, url=url, kwargs) 60 61
~.conda\envs\python37\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 587 } 588 send_kwargs.update(settings) --> 589 resp = self.send(prep, **send_kwargs) 590 591 return resp
~.conda\envs\python37\lib\site-packages\requests\sessions.py in send(self, request, kwargs) 701 702 # Send the request --> 703 r = adapter.send(request, kwargs) 704 705 # Total elapsed time of the request (approximately)
~.conda\envs\python37\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 517 raise SSLError(e, request=request) 518 --> 519 raise ConnectionError(e, request=request) 520 521 except ClosedPoolError as e:
ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed'))
Hi,
Thanks for answers, I delete the models document and the models were reload autamatic while the AllenNLP model still downdoad failed My python version is python3.7, and used Windows_x86 system with anaconda3 environment jupyter.
If we should download the offline package? and where to download?
The input and error as follows:
Input: from chemdataextractor import Document doc = Document('UV-vis spectrum of 5,10,15,20-Tetra(4-carboxyphenyl)porphyrin in Tetrahydrofuran (THF).') doc.cems
Error:
tialising AllenNLP model -1.0.pickle, downloading . 25h OSError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\allennlp\common\util.py in get_spacy_model(spacy_model_name, pos_tags, parse, ner) 288 try: --> 289 spacy_model = spacy.load(spacy_model_name, disable=disable) 290 except OSError:
~.conda\envs\python37\lib\site-packages\spacyinit.py in load(name, overrides) 26 deprecation_warning(Warnings.W001.format(path=depr_path)) ---> 27 return util.load_model(name, overrides) 28
~.conda\envs\python37\lib\site-packages\spacy\util.py in load_model(name, overrides) 138 return load_model_from_path(name, overrides) --> 139 raise IOError(Errors.E050.format(name=name)) 140
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
During handling of the above exception, another exception occurred:
gaierror Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\urllib3\connection.py in _new_conn(self) 158 conn = connection.create_connection( --> 159 (self._dns_host, self.port), self.timeout, **extra_kw) 160
~.conda\envs\python37\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options) 56 ---> 57 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): 58 af, socktype, proto, canonname, sa = res
~.conda\envs\python37\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags) 747 addrlist = [] --> 748 for res in _socket.getaddrinfo(host, port, family, type, proto, flags): 749 af, socktype, proto, canonname, sa = res
gaierror: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 599 body=body, headers=headers, --> 600 chunked=chunked) 601
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 342 try: --> 343 self._validate_conn(conn) 344 except (SocketTimeout, BaseSSLError) as e:
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn) 838 if not getattr(conn, 'sock', None): # AppEngine might not have
.sock
--> 839 conn.connect() 840~.conda\envs\python37\lib\site-packages\urllib3\connection.py in connect(self) 300 # Add certificate verification --> 301 conn = self._new_conn() 302 hostname = self.host
~.conda\envs\python37\lib\site-packages\urllib3\connection.py in _new_conn(self) 167 raise NewConnectionError( --> 168 self, "Failed to establish a new connection: %s" % e) 169
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last) ~.conda\envs\python37\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 496 timeout=timeout, --> 497 chunked=chunked, 498 )
~.conda\envs\python37\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 637 retries = retries.increment(method, url, error=e, _pool=self, --> 638 _stacktrace=sys.exc_info()[2]) 639 retries.sleep()
~.conda\envs\python37\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 398 if new_retry.is_exhausted(): --> 399 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 400
MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_20180\270517424.py in ----> 1 doc.cems
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in cems(self) 564 A list of all Chemical Entity Mentions in this document as :class:
~chemdataextractor.doc.text.Span
565 """ --> 566 return list(set([n for el in self.elements for n in el.cems])) 567 568 @Property~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in (.0) 564 A list of all Chemical Entity Mentions in this document as :class:
~chemdataextractor.doc.text.Span
565 """ --> 566 return list(set([n for el in self.elements for n in el.cems])) 567 568 @Property~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in cems(self) 344 A list of all Chemical Entity Mentions in this text as :class:
chemdataextractor.doc.text.span
345 """ --> 346 return [cem for sent in self.sentences for cem in sent.cems] 347 348 @Property~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in (.0) 344 A list of all Chemical Entity Mentions in this text as :class:
chemdataextractor.doc.text.span
345 """ --> 346 return [cem for sent in self.sentences for cem in sent.cems] 347 348 @Property~.conda\envs\python37\lib\site-packages\chemdataextractor\utils.py in fget_memoized(self) 27 def fget_memoized(self): 28 if not hasattr(self, attr_name): ---> 29 setattr(self, attr_name, fget(self)) 30 return getattr(self, attr_name) 31 return property(fget_memoized)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in cems(self) 642 spans = [] 643 # print(self.text.encode('utf8')) --> 644 for result in chemical_name.scan(self.tokens): 645 # parser scan yields (result, startindex, endindex) - we just use the indexes here 646 tokens = self.tokens[result[1]:result[2]]
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in scan(self, tokens, max_matches, overlap) 115 while i < length and matches < max_matches: 116 try: --> 117 results, next_i = self.parse(tokens, i) 118 except ParseException as err: 119 # print(err.msg)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 425 results = [] 426 for e in self.exprs: --> 427 exprresults, i = e.parse(tokens, i) 428 if exprresults is not None: 429 results.extend(exprresults)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 682 results = [] 683 try: --> 684 results, i = self.expr.parse(tokens, i, actions) 685 except (ParseException, IndexError): 686 pass
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 425 results = [] 426 for e in self.exprs: --> 427 exprresults, i = e.parse(tokens, i) 428 if exprresults is not None: 429 results.extend(exprresults)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 628 def _parse_tokens(self, tokens, i, actions=True): 629 try: --> 630 self.expr.try_parse(tokens, i) 631 except (ParseException, IndexError): 632 pass
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in try_parse(self, tokens, i) 158 159 def try_parse(self, tokens, i): --> 160 return self.parse(tokens, i, actions=False)[1] 161 162 def _parse_tokens(self, tokens, i, actions=True):
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in parse(self, tokens, i, actions) 144 """ 145 try: --> 146 result, found_index = self._parse_tokens(tokens, i, actions) 147 except IndexError: 148 raise ParseException(tokens, i, 'IndexError', self)
~.conda\envs\python37\lib\site-packages\chemdataextractor\parse\elements.py in _parse_tokens(self, tokens, i, actions) 295 def _parse_tokens(self, tokens, i, actions=True): 296 token = tokens[i] --> 297 tag = token[self.tag_type] 298 if tag == self.match: 299 return [E(self.name or safe_name(tag), token[0])], i + 1
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getitem(self, key) 1073 return self.text 1074 elif key == 1: -> 1075 return self.legacy_pos_tag 1076 elif isinstance(key, str): 1077 return self.getattr(key)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in legacy_pos_tag(self) 1063 def legacy_pos_tag(self): 1064 pos_tag = self[POS_TAG_TYPE] -> 1065 ner_tag = self[NER_TAG_TYPE] 1066 if ner_tag is not None and ner_tag != "O": 1067 return ner_tag
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getitem(self, key) 1075 return self.legacy_pos_tag 1076 elif isinstance(key, str): -> 1077 return self.getattr(key) 1078 else: 1079 raise IndexError("Key" + str(key) + " is out of bounds for this token.")
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in getattr(self, name) 1083 return self._tags[name] 1084 else: -> 1085 self.sentence._assign_tags(name) 1086 if name not in self._tags.keys(): 1087 raise AttributeError(name + " is not a supported tag type for the sentence: " + str(self.sentence) + str(self.sentence.taggers) + str(type(self.sentence))
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\text.py in _assign_tags(self, tag_type) 788 tags = None 789 if hasattr(tagger, "batch_tag_for_type") and tagger.can_batch_tag(tag_type) and self.document is not None: --> 790 self.document._batch_assign_tags(tagger, tag_type) 791 elif hasattr(tagger, "tag_for_type"): 792 tags = tagger.tag_for_type(self.tokens, tag_type)
~.conda\envs\python37\lib\site-packages\chemdataextractor\doc\document.py in _batch_assign_tags(self, tagger, tag_type) 621 622 if hasattr(tagger, "batch_tag_for_type"): --> 623 tag_results = tagger.batch_tag_for_type(all_tokens, tag_type) 624 else: 625 tag_results = tagger.batch_tag(all_tokens)
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\tag.py in batch_tag_for_type(self, sents, tag_type) 204 """ 205 tagger = self.taggers_dict[tag_type] --> 206 return tagger.batch_tag(sents) 207 208 def can_batch_tag(self, tag_type):
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\allennlpwrapper.py in batch_tag(self, sents) 193 log.debug("".join(["Batch size:", str(len(instance))])) 194 with torch.no_grad(): --> 195 batch_predictions = self.predictor.predict_batch_instance(instance) 196 predictions.extend(batch_predictions) 197 prediction_end_time = datetime.datetime.now()
~.conda\envs\python37\lib\site-packages\chemdataextractor\nlp\allennlpwrapper.py in predictor(self) 157 model = model.cuda(gpu_id) 158 model = model.eval() --> 159 self._predictor = copy.deepcopy(SentenceTaggerPredictor(model=model, dataset_reader=None)) 160 sp.ok("✔") 161 return self._predictor
~.conda\envs\python37\lib\site-packages\allennlp\predictors\sentence_tagger.py in init(self, model, dataset_reader, language) 24 def init(self, model: Model, dataset_reader: DatasetReader, language: str = 'en_core_web_sm') -> None: 25 super().init(model, dataset_reader) ---> 26 self._tokenizer = SpacyWordSplitter(language=language, pos_tags=True) 27 28 def predict(self, sentence: str) -> JsonDict:
~.conda\envs\python37\lib\site-packages\allennlp\data\tokenizers\word_splitter.py in init(self, language, pos_tags, parse, ner, keep_spacy_tokens, split_on_spaces) 171 keep_spacy_tokens: bool = False, 172 split_on_spaces: bool = False) -> None: --> 173 self.spacy = get_spacy_model(language, pos_tags, parse, ner) 174 if split_on_spaces: 175 self.spacy.tokenizer = WhitespaceTokenizer(self.spacy.vocab)
~.conda\envs\python37\lib\site-packages\allennlp\common\util.py in get_spacy_model(spacy_model_name, pos_tags, parse, ner) 290 except OSError: 291 logger.warning(f"Spacy models '{spacy_model_name}' not found. Downloading and installing.") --> 292 spacy_download(spacy_model_name) 293 # NOTE(mattg): The following four lines are a workaround suggested by Ines for spacy 294 # 2.1.0, which removed the linking that was done in spacy 2.0. importlib doesn't find
~.conda\envs\python37\lib\site-packages\spacy\cli\download.py in download(model, direct, *pip_args) 36 dl = download_model(dl_tpl.format(m=model_name, v=version), pip_args) 37 else: ---> 38 shortcuts = get_json(about.shortcuts, "available shortcuts") 39 model_name = shortcuts.get(model, model) 40 compatibility = get_compatibility()
~.conda\envs\python37\lib\site-packages\spacy\cli\download.py in get_json(url, desc) 82 83 def get_json(url, desc): ---> 84 r = requests.get(url) 85 if r.status_code != 200: 86 msg.fail(
~.conda\envs\python37\lib\site-packages\requests\api.py in get(url, params, kwargs) 71 """ 72 ---> 73 return request("get", url, params=params, kwargs) 74 75
~.conda\envs\python37\lib\site-packages\requests\api.py in request(method, url, kwargs) 57 # cases, and look like a memory leak in others. 58 with sessions.Session() as session: ---> 59 return session.request(method=method, url=url, kwargs) 60 61
~.conda\envs\python37\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 587 } 588 send_kwargs.update(settings) --> 589 resp = self.send(prep, **send_kwargs) 590 591 return resp
~.conda\envs\python37\lib\site-packages\requests\sessions.py in send(self, request, kwargs) 701 702 # Send the request --> 703 r = adapter.send(request, kwargs) 704 705 # Total elapsed time of the request (approximately)
~.conda\envs\python37\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 517 raise SSLError(e, request=request) 518 --> 519 raise ConnectionError(e, request=request) 520 521 except ClosedPoolError as e:
ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000024AB84445C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed'))
I have treated this problem by change a wifi The failed of initializing ALLENNLP is due to the en_core_web_sm==2.1.0 cannot be downloaded automatically. The package can download in https://github.com/explosion/spacy-models/releases?q=en_core_web_sm-2.1.0&expanded=true, and intall it manual.
The doc.cems can run~~.
Hi, When I run the following Code: from chemdataextractor import Document from chemdataextractor.model import Compound from chemdataextractor.doc import Paragraph, Heading
d = Document( Heading(u'Synthesis of 2,4,6-trinitrotoluene (3a)'), Paragraph(u'The procedure was followed to yield a pale yellow solid (boiling point 240 °C)') ) d.records.serialize()
However I get the following error:
EOFError Traceback (most recent call last)