Open magnuspalmblad opened 8 months ago
It seems it fails when it encounters the second ID, e.g. in a results file
18059 38245985
18059 38245985
18059 38245985
18059 38245985
18059 38245985
18059 38245985
17002 38245985
it fails on the last line. However, if I provide a file with just 17002, it works fine, so it is not this ID per se. Most curious!
I could get around the problem with the column list not having scalar elements by casting them to strings. This works, as we only print the table to a text file afterwards, but it not entirely kosher:
column_list = [str(elem) for elem in column_list]
table.loc[:, key] = column_list # failed here on key Class, hence the cast to string above
Hi Magnus, I see you resolved the issue but I'm still curious how it happened. I cannot recreate it with the example you shared, could you maybe drop me the complete file so I can investigate? Thanks.
I think it depends on the version of SCOPE - the version I installed on September 9, 2021, works fine (except the other issue with the new ChEBIs that do not have IDF numbers). The current version does not work. I am testing with Python 3.11.5 and Pandas 2.0.3, but I have also tried later versions of both with the same error.
There are quite a few differences between these, many related to the TFIDF normalization. But the new (non-working) version also specifies the data types when reading in the dataframe, which the old (working) version did not. Perhaps this has something to do with it?
I tested make_table with the HILIC and APCI results with python=3.11.5 and pandas >2 and it worked. Do those results also give an error for you?
Yes, with the current SCOPE, Python 3.11.5 and Pandas 2.0.3, I get the following error on the included APCI results file:
(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>python make_table.py -i results -t folder
making table for APCI
Traceback (most recent call last):
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 115, in <module>
main()
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 104, in main
table = make_table(data, df_results)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 57, in make_table
table.loc[:,key] = column_list
~~~~~~~~~^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 849, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1716, in _setitem_with_indexer
take_split_path = not can_hold_element(
^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1744, in can_hold_element
np_can_hold_element(dtype, element)
File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1770, in np_can_hold_element
tipo = _maybe_infer_dtype_type(element)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 976, in _maybe_infer_dtype_type
element = np.asarray(element)
^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3492,) + inhomogeneous part.
I'm so confused why I can't reproduce this error (python 3.11.5 and pandas 2.0.3). Could you try creating a new conda environment with python=3.11.5 and pip install pandas=2.0.3 and see if it fails the same way?
I created new conda environments with Python 3.8, 3.11 and 3.12, activating the environment and running pip install pandas
inside the environment. They all generate the same error as before (with the current version of SCOPE, fetched from GitHub today). So it does not seem to be the Python version...
I think we can defintely exclude that it is software related. I'm reading back your comments and you said that it failed on key "Class", which ChEBI2Class file do you have? For me it's "ChEBI2Class_rel212.pkl"
Yes, same version, downloaded today with download_files.py. For a while, I thought it was the occurrence of new ChEBI identifiers, but these are not present in the APCI and HILIC example results. I agree it must be something in the execution environment.
This is what I did:
(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>conda create --name test python=3.11.5
WARNING: A conda environment already exists at 'C:\Users\Magnus Palmblad\anaconda3\envs\test'
Remove existing environment (y/[n])? y
WARNING: A space was detected in your requested environment path:
'C:\Users\Magnus Palmblad\anaconda3\envs\test'
Spaces in paths can sometimes be problematic. To minimize issues,
make sure you activate your environment before running any executables!
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 23.7.4
latest version: 24.1.2
Please update conda by running
$ conda update -n base -c defaults conda
Or to minimize the number of packages updated during conda update use
conda install conda=24.1.2
## Package Plan ##
environment location: C:\Users\Magnus Palmblad\anaconda3\envs\test
added / updated specs:
- python=3.11.5
The following packages will be downloaded:
package | build
---------------------------|-----------------
python-3.11.5 |h2628c8c_0_cpython 17.3 MB conda-forge
------------------------------------------------------------
Total: 17.3 MB
The following NEW packages will be INSTALLED:
bzip2 conda-forge/win-64::bzip2-1.0.8-hcfcfb64_5
ca-certificates conda-forge/win-64::ca-certificates-2024.2.2-h56e8100_0
libexpat conda-forge/win-64::libexpat-2.5.0-h63175ca_1
libffi conda-forge/win-64::libffi-3.4.2-h8ffe710_5
libsqlite conda-forge/win-64::libsqlite-3.45.1-hcfcfb64_0
libzlib conda-forge/win-64::libzlib-1.2.13-hcfcfb64_5
openssl conda-forge/win-64::openssl-3.2.1-hcfcfb64_0
pip conda-forge/noarch::pip-24.0-pyhd8ed1ab_0
python conda-forge/win-64::python-3.11.5-h2628c8c_0_cpython
setuptools conda-forge/noarch::setuptools-69.1.1-pyhd8ed1ab_0
tk conda-forge/win-64::tk-8.6.13-h5226925_1
tzdata conda-forge/noarch::tzdata-2024a-h0c530f3_0
ucrt conda-forge/win-64::ucrt-10.0.22621.0-h57928b3_0
vc conda-forge/win-64::vc-14.3-hcf57466_18
vc14_runtime conda-forge/win-64::vc14_runtime-14.38.33130-h82b7239_18
vs2015_runtime conda-forge/win-64::vs2015_runtime-14.38.33130-hcb4865c_18
wheel conda-forge/noarch::wheel-0.42.0-pyhd8ed1ab_0
xz conda-forge/win-64::xz-5.2.6-h8d14728_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate test
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>conda activate test
(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>pip install pandas==2.0.3
Collecting pandas==2.0.3
Downloading pandas-2.0.3-cp311-cp311-win_amd64.whl.metadata (18 kB)
Collecting python-dateutil>=2.8.2 (from pandas==2.0.3)
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1 (from pandas==2.0.3)
Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas==2.0.3)
Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy>=1.21.0 (from pandas==2.0.3)
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
---------------------------------------- 61.0/61.0 kB 819.4 kB/s eta 0:00:00
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas==2.0.3)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Downloading pandas-2.0.3-cp311-cp311-win_amd64.whl (10.6 MB)
---------------------------------------- 10.6/10.6 MB 10.7 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
---------------------------------------- 15.8/15.8 MB 9.6 MB/s eta 0:00:00
Using cached pytz-2024.1-py2.py3-none-any.whl (505 kB)
Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)
Installing collected packages: pytz, tzdata, six, numpy, python-dateutil, pandas
Successfully installed numpy-1.26.4 pandas-2.0.3 python-dateutil-2.8.2 pytz-2024.1 six-1.16.0 tzdata-2024.1
(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>python make_table.py -i results -t folder
making table for APCI
Traceback (most recent call last):
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 115, in <module>
main()
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 104, in main
table = make_table(data, df_results)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 57, in make_table
table.loc[:,key] = column_list
~~~~~~~~~^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\indexing.py", line 849, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\indexing.py", line 1716, in _setitem_with_indexer
take_split_path = not can_hold_element(
^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 1744, in can_hold_element
np_can_hold_element(dtype, element)
File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 1770, in np_can_hold_element
tipo = _maybe_infer_dtype_type(element)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 976, in _maybe_infer_dtype_type
element = np.asarray(element)
^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3492,) + inhomogeneous part.
(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>
Can you add a line to the make_table script where you print the key and the column list, and report the ones that make it fail?
I can read and print the entire column_list, and it looks right. It is the assignment table.loc[:,key] = column_list
that fails. And it also fails also very short column_lists, with only two ChEBIs. It does not seem to matter which ones.
I can add that I also tried three different computers (my personal laptop, LUMC laptop and LUMC workstation), with different Python environments. As soon as I update these (or download Python/Pandas again), I get the error above. Python complains about column_list not containing scalar values (which it should, as far as I understand how it is constructed). Converting all the elements to strings did solve the problem, but they should have been strings all along, right? Anyway, many thanks for looking into this!
I am getting the following error, in make_table.py, which I think are new. I am now running Python 3.11.5 packaged by Anaconda. I have used this environment in the past, and it has worked.