CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
114 stars 28 forks source link

Installation Issue #60

Closed OPPOSITEFOOLS closed 3 days ago

OPPOSITEFOOLS commented 1 week ago

Hello, I am using windows and python 3.8. The C++ dev tools were installed, although I didn't install them, particularly for this. I think it should be working.

When I use pip install chemdataextractor2, it tries collecting spacy and shows an error when installing build dependencies. And then lots of red lines and yellow lines appear, I think it tries to use another version of spacy? image

But finally it just stuck and I have to cancel it manually. output.txt And this is the output.

It will be so good if someone could help me out with this. Thank you!

Dingyun-Huang commented 1 week ago

Have you tried create a new and fresh virtual environment (e.g. conda) and install chemdataextractor2?

OPPOSITEFOOLS commented 1 week ago

Have you tried create a new and fresh virtual environment (e.g. conda) and install chemdataextractor2?

I have tried creating new clean python environment, and it also ends up with this problem. I am not used to using conda, will that make big differences?

OPPOSITEFOOLS commented 1 week ago

ERROR: Could not build wheels for blis, which is required to install pyproject.toml-based projects

I also tried conda just now, and yeah it is still struggling building wheels. :(

Dingyun-Huang commented 1 week ago

Hi, could you post the output when you tried installing in a conda environment?

OPPOSITEFOOLS commented 1 week ago

output.txt output2.txt These are the outputs with conda.

Dingyun-Huang commented 1 week ago

The following link might be relevant to you, as suggested line 89 and 90 in your output2.txt file.

clang -c C:\Users\gaoti\AppData\Local\Temp\pip-install-c3prijv3\blis_ddabaedcf446421ea869e7d90de8c585\blis_src\config\bulldozer\bli_cntx_init_bulldozer.c -o C:\Users\gaoti\AppData\Local\Temp\tmpgf353u9g\bli_cntx_init_bulldozer.o -O2 -funroll-all-loops -std=c99 -D_POSIX_C_SOURCE=200112L -DBLIS_VERSION_STRING="0.5.0-6" -DBLIS_IS_BUILDING_LIBRARY -Iinclude\windows-x86_64 -I.\frame\3\ -I.\frame\ind\ukernels\ -I.\frame\1m\ -I.\frame\1f\ -I.\frame\1\ -I.\frame\include -IC:\Users\gaoti\AppData\Local\Temp\pip-install-c3prijv3\blis_ddabaedcf446421ea869e7d90de8c585\blis_src\include\windows-x86_64 error: [WinError 2] The system cannot find the file specified

https://stackoverflow.com/questions/61669873/python-venv-env-fails-winerror-2-the-system-cannot-find-the-file-specified

OPPOSITEFOOLS commented 1 week ago

I don't think that works for me. I have tried run it as admin it still returns the same thing. output.txt This is the entire process what I did. does it usually work for others if they do the same? I am thinking about reinstalling.

OPPOSITEFOOLS commented 1 week ago

I also tried it in a new google colab notebook, it returns the same error.

Dingyun-Huang commented 1 week ago

Hi there, I just managed to reproduce the error, I will take a look into it. Meanwhile, you can try using Windows SubSystem, where you can have an Linux environment on windows. The installation is much easier on Linux (also better tested with GitHub Action).

Dingyun-Huang commented 3 days ago

Hi, I think the short answer for this is use Python 3.7. Though SpaCy 2.1.9 will work on Python 3.8, it would require manual compilation from source on Windows platform.

OPPOSITEFOOLS commented 2 days ago

Yes python 3.7 works for me. And by the way, I'm working with ChemDataExtractor and using it to parse documents for specific information, such as catalyst mass. I've implemented a parser based on the provided examples, and it works well for extracting entries that match my regular expressions. However, I have a question about handling documents with multiple entries that mention catalyst mass. In some cases, there are multiple mentions of catalyst mass, but only one of them is relevant to my extraction needs.

How can I improve my parsing to ensure that only the relevant entry is extracted? Is it primarily about refining the regular expressions further, or are there additional strategies or tools within ChemDataExtractor that can help distinguish between relevant and non-relevant entries? Any guidance or best practices for this kind of selective extraction would be greatly appreciated.

Dingyun-Huang commented 2 days ago

Hi there, there is no direct way to determine which property/catalyst entries are "relevant". Can you open a new issue and post your detailed example there such that we can figure out what you need.