CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
121 stars 28 forks source link

Failed to run "pip install -r requirements.txt" #42

Closed sukiluvcode closed 10 months ago

sukiluvcode commented 1 year ago

Hi,

I got some problem while installing allennlp==0.9.0. More detailed, the dependency of allennlp, called spacy failed when building. My command is: pip install -r requirements.txt and my OS was Macos Ventura, python==3.8, pip==23.1.2 error below:

Using cached spacy-2.1.9.tar.gz (30.7 MB) Installing build dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [138 lines of output] Collecting setuptools Using cached setuptools-68.0.0-py3-none-any.whl (804 kB) Collecting wheel<0.33.0,>0.32.0 Using cached wheel-0.32.3-py2.py3-none-any.whl (21 kB) Collecting Cython Using cached Cython-0.29.36-py2.py3-none-any.whl (988 kB) Collecting cymem<2.1.0,>=2.0.2 Using cached cymem-2.0.7-cp38-cp38-macosx_11_0_arm64.whl (31 kB) Collecting preshed<2.1.0,>=2.0.1 Using cached preshed-2.0.1-cp38-cp38-macosx_11_0_arm64.whl Collecting murmurhash<1.1.0,>=0.28.0 Using cached murmurhash-1.0.9-cp38-cp38-macosx_11_0_arm64.whl (19 kB) Collecting thinc<7.1.0,>=7.0.8 Using cached thinc-7.0.8-cp38-cp38-macosx_11_0_arm64.whl Collecting blis<0.3.0,>=0.2.1 (from thinc<7.1.0,>=7.0.8) Using cached blis-0.2.4.tar.gz (1.5 MB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Collecting wasabi<1.1.0,>=0.0.9 (from thinc<7.1.0,>=7.0.8) Using cached wasabi-0.10.1-py3-none-any.whl (26 kB) Collecting srsly<1.1.0,>=0.0.6 (from thinc<7.1.0,>=7.0.8) Using cached srsly-1.0.6-cp38-cp38-macosx_11_0_arm64.whl (206 kB) Collecting numpy>=1.7.0 (from thinc<7.1.0,>=7.0.8) Using cached numpy-1.24.4-cp38-cp38-macosx_11_0_arm64.whl (13.8 MB) Collecting plac<1.0.0,>=0.9.6 (from thinc<7.1.0,>=7.0.8) Using cached plac-0.9.6-py2.py3-none-any.whl (20 kB) Collecting tqdm<5.0.0,>=4.10.0 (from thinc<7.1.0,>=7.0.8) Using cached tqdm-4.65.0-py3-none-any.whl (77 kB) Building wheels for collected packages: blis Building wheel for blis (setup.py): started Building wheel for blis (setup.py): finished with status 'error' error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [138 lines of output]

I tried to install spacy separately, but still failed. Maybe the version is too old? i guess.

Thanks, Soike

OBrink commented 1 year ago

Hello Soike,

The installation of ChemDataExtractor is a complete nightmare since the introduction of the new CEM recognition system in version 2.1, and it appears that this repository has been abandoned by the developers. It is probably the curse of academic software projects where doctoral students are replaced every couple of years.

I strongly recommend using this docker image. After installing Docker, you can simply follow the documentation that I have written in the DockerHub repository. Let me know if you run into problems with it! :)

Have a nice day! Otto

sukiluvcode commented 1 year ago

Hello Soike,

The installation of ChemDataExtractor is a complete nightmare since the introduction of the new CEM recognition system in version 2.1, and it appears that this repository has been abandoned by the developers. It is probably the curse of academic software projects where doctoral students are replaced every couple of years.

I strongly recommend using this docker image. After installing Docker, you can simply follow the documentation that I have written in the DockerHub repository. Let me know if you run into problems with it! :)

Have a nice day! Otto

Hello Otto,

I follow your docker file instructions and successfully deploy it and open automatic_parsers.ipynb file, however, I run into this error: In [3] doc.records HTTPError Traceback (most recent call last) Cell In[3], line 1 ----> 1 doc.records

File /usr/local/lib/python3.8/site-packages/chemdataextractor/doc/document.py:237, in Document.records(self) 232 last_id_record = None 234 # FORWARD INTERDEPENDENCY RESOLUTION -- Updated model parsers to reflect defined entities 235 # 1. Find any defined entities in the element e.g. "Curie Temperature, Tc" 236 # 2. Update the relevant models --> 237 element_definitions = el.definitions 238 chemical_defs = el.chemical_definitions 240 for model in el._streamlined_models:

File /usr/local/lib/python3.8/site-packages/chemdataextractor/doc/text.py:353, in Text.definitions(self) 348 @property 349 def definitions(self): 350 """ 351 Return a list of tagged definitions for each sentence in this text passage 352 """ --> 353 return [definition for sent in self.sentences for definition in sent.definitions]

File /usr/local/lib/python3.8/site-packages/chemdataextractor/utils.py:29, in memoized_property..fget_memoized(self) 26 @functools.wraps(fget) 27 def fget_memoized(self): 28 if not hasattr(self, attr_name): ---> 29 setattr(self, attr_name, fget(self)) 30 return getattr(self, attr_name)

File /usr/local/lib/python3.8/site-packages/chemdataextractor/doc/text.py:253, in Text.sentences(self) 250 @memoized_property 251 def sentences(self): 252 """A list of :class:Sentence s that make up this text passage.""" --> 253 sents = self.sentence_tokenizer.get_sentences(self) 254 for sent in sents: 255 sent.document = self.document

File /usr/local/lib/python3.8/site-packages/chemdataextractor/nlp/tokenize.py:77, in SentenceTokenizer.get_sentences(self, text) 76 def get_sentences(self, text): ---> 77 spans = self.span_tokenize(text.text) 78 return text._sentences_from_spans(spans)

File /usr/local/lib/python3.8/site-packages/chemdataextractor/nlp/tokenize.py:87, in SentenceTokenizer.span_tokenize(self, s) 81 """Return a list of integer offsets that identify sentences in the given text. 82 83 :param string s: The text to tokenize into sentences. 84 :rtype: iter(tuple(int, int)) 85 """ 86 if self._tokenizer is None: ---> 87 self._tokenizer = load_model(self.model) 88 # for debug in tokenizer.debug_decisions(s): 89 # log.debug(format_debug_decision(debug)) 90 return self._tokenizer.span_tokenize(s)

File /usr/local/lib/python3.8/site-packages/chemdataextractor/data.py:154, in load_model(path) 152 def load_model(path): 153 """Load a model from a pickle file in the data directory. Cached so model is only loaded once.""" --> 154 abspath = find_data(path) 155 cached = _model_cache.get(abspath) 156 if cached is not None:

File /usr/local/lib/python3.8/site-packages/chemdataextractor/data.py:138, in find_data(path, warn, get_data) 136 for package in PACKAGES: 137 if package.path == path: --> 138 package.download() 139 break 140 elif warn and not os.path.exists(full_path):

File /usr/local/lib/python3.8/site-packages/chemdataextractor/data.py:89, in Package.download(self, force) 87 ensure_dir(os.path.dirname(self.local_path)) 88 r = requests.get(self.remote_path, stream=True) ---> 89 r.raise_for_status() 90 # Check if already downloaded 91 if self.local_exists(): 92 # Skip if existing, unless the file has changed

File /usr/local/lib/python3.8/site-packages/requests/models.py:940, in Response.raise_for_status(self) 937 http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url) 939 if http_error_msg: --> 940 raise HTTPError(http_error_msg, response=self)

HTTPError: 500 Server Error: reading HTTP response body: unexpected EOF for url: http://data.chemdataextractor.org/models/punkt_chem-1.0.pickle

It seems that the error was caused by the failure to connect to the web server while downloading the pickle file. Is there an alternative way to download this file?

Thanks! Soike

OBrink commented 1 year ago

I just checked it again, and I cannot reproduce your problem.

Before starting the Jupyter notebook in the Docker container, can you check if the internet connection works? What happens if you run ping google.com in the Docker container? Are you using a VPN or is there potentially a firewall that blocks the internet access?

This could be related: https://stackoverflow.com/questions/20430371/my-docker-container-has-no-internet

sukiluvcode commented 10 months ago

Sorry about the delay of replying, I checked the internet connection, and i found the problem is my vpn, which can not proxy my requests when downloading the online source. Sadly, this can't be solved due to the GFW.

Thanks for your help, I guess now i am going to change to anoter way to do my research.