ReinV / SCOPE

Search and Chemical Ontology Plotting Environment
Other
1 stars 2 forks source link

New error from make_table.py #64

Open magnuspalmblad opened 8 months ago

magnuspalmblad commented 8 months ago

I am getting the following error, in make_table.py, which I think are new. I am now running Python 3.11.5 packaged by Anaconda. I have used this environment in the past, and it has worked.

(base) G:\Projects\Nina\SCOPE-master>python make_table.py -i results_lipids -t folder
making table for lipidomics

Traceback (most recent call last):
  File "G:\Projects\Nina\SCOPE-master\make_table.py", line 115, in <module>
    main()
  File "G:\Projects\Nina\SCOPE-master\make_table.py", line 104, in main
    table = make_table(data, df_results)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Projects\Nina\SCOPE-master\make_table.py", line 57, in make_table
    table.loc[:,key] = column_list
    ~~~~~~~~~^^^^^^^
  File "C:\Users\nmpalmblad\AppData\Local\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 849, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "C:\Users\nmpalmblad\AppData\Local\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1716, in _setitem_with_indexer
    take_split_path = not can_hold_element(
                          ^^^^^^^^^^^^^^^^^
  File "C:\Users\nmpalmblad\AppData\Local\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1744, in can_hold_element
    np_can_hold_element(dtype, element)
  File "C:\Users\nmpalmblad\AppData\Local\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1770, in np_can_hold_element
    tipo = _maybe_infer_dtype_type(element)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nmpalmblad\AppData\Local\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 976, in _maybe_infer_dtype_type
    element = np.asarray(element)
              ^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10870,) + inhomogeneous part.

(base) G:\Projects\Nina\SCOPE-master>
magnuspalmblad commented 8 months ago

It seems it fails when it encounters the second ID, e.g. in a results file

18059   38245985
18059   38245985
18059   38245985
18059   38245985
18059   38245985
18059   38245985
17002   38245985

it fails on the last line. However, if I provide a file with just 17002, it works fine, so it is not this ID per se. Most curious!

magnuspalmblad commented 8 months ago

I could get around the problem with the column list not having scalar elements by casting them to strings. This works, as we only print the table to a text file afterwards, but it not entirely kosher:

column_list = [str(elem) for elem in column_list]
        table.loc[:, key] = column_list # failed here on key Class, hence the cast to string above
ReinV commented 8 months ago

Hi Magnus, I see you resolved the issue but I'm still curious how it happened. I cannot recreate it with the example you shared, could you maybe drop me the complete file so I can investigate? Thanks.

magnuspalmblad commented 8 months ago

I think it depends on the version of SCOPE - the version I installed on September 9, 2021, works fine (except the other issue with the new ChEBIs that do not have IDF numbers). The current version does not work. I am testing with Python 3.11.5 and Pandas 2.0.3, but I have also tried later versions of both with the same error.

There are quite a few differences between these, many related to the TFIDF normalization. But the new (non-working) version also specifies the data types when reading in the dataframe, which the old (working) version did not. Perhaps this has something to do with it?

make_table_working.zip make_table_not_working.zip

ReinV commented 8 months ago

I tested make_table with the HILIC and APCI results with python=3.11.5 and pandas >2 and it worked. Do those results also give an error for you?

magnuspalmblad commented 8 months ago

Yes, with the current SCOPE, Python 3.11.5 and Pandas 2.0.3, I get the following error on the included APCI results file:

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>python make_table.py -i results -t folder
making table for APCI
Traceback (most recent call last):
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 115, in <module>
    main()
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 104, in main
    table = make_table(data, df_results)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 57, in make_table
    table.loc[:,key] = column_list
    ~~~~~~~~~^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 849, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1716, in _setitem_with_indexer
    take_split_path = not can_hold_element(
                          ^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1744, in can_hold_element
    np_can_hold_element(dtype, element)
  File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 1770, in np_can_hold_element
    tipo = _maybe_infer_dtype_type(element)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\Lib\site-packages\pandas\core\dtypes\cast.py", line 976, in _maybe_infer_dtype_type
    element = np.asarray(element)
              ^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3492,) + inhomogeneous part.
ReinV commented 8 months ago

I'm so confused why I can't reproduce this error (python 3.11.5 and pandas 2.0.3). Could you try creating a new conda environment with python=3.11.5 and pip install pandas=2.0.3 and see if it fails the same way?

magnuspalmblad commented 8 months ago

I created new conda environments with Python 3.8, 3.11 and 3.12, activating the environment and running pip install pandas inside the environment. They all generate the same error as before (with the current version of SCOPE, fetched from GitHub today). So it does not seem to be the Python version...

ReinV commented 8 months ago

I think we can defintely exclude that it is software related. I'm reading back your comments and you said that it failed on key "Class", which ChEBI2Class file do you have? For me it's "ChEBI2Class_rel212.pkl"

magnuspalmblad commented 8 months ago

Yes, same version, downloaded today with download_files.py. For a while, I thought it was the occurrence of new ChEBI identifiers, but these are not present in the APCI and HILIC example results. I agree it must be something in the execution environment.

This is what I did:

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>conda create --name test python=3.11.5
WARNING: A conda environment already exists at 'C:\Users\Magnus Palmblad\anaconda3\envs\test'
Remove existing environment (y/[n])? y

WARNING: A space was detected in your requested environment path:
'C:\Users\Magnus Palmblad\anaconda3\envs\test'
Spaces in paths can sometimes be problematic. To minimize issues,
make sure you activate your environment before running any executables!

Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful attempt using repodata from current_repodata.json, retrying with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 23.7.4
  latest version: 24.1.2

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=24.1.2

## Package Plan ##

  environment location: C:\Users\Magnus Palmblad\anaconda3\envs\test

  added / updated specs:
    - python=3.11.5

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python-3.11.5              |h2628c8c_0_cpython        17.3 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        17.3 MB

The following NEW packages will be INSTALLED:

  bzip2              conda-forge/win-64::bzip2-1.0.8-hcfcfb64_5
  ca-certificates    conda-forge/win-64::ca-certificates-2024.2.2-h56e8100_0
  libexpat           conda-forge/win-64::libexpat-2.5.0-h63175ca_1
  libffi             conda-forge/win-64::libffi-3.4.2-h8ffe710_5
  libsqlite          conda-forge/win-64::libsqlite-3.45.1-hcfcfb64_0
  libzlib            conda-forge/win-64::libzlib-1.2.13-hcfcfb64_5
  openssl            conda-forge/win-64::openssl-3.2.1-hcfcfb64_0
  pip                conda-forge/noarch::pip-24.0-pyhd8ed1ab_0
  python             conda-forge/win-64::python-3.11.5-h2628c8c_0_cpython
  setuptools         conda-forge/noarch::setuptools-69.1.1-pyhd8ed1ab_0
  tk                 conda-forge/win-64::tk-8.6.13-h5226925_1
  tzdata             conda-forge/noarch::tzdata-2024a-h0c530f3_0
  ucrt               conda-forge/win-64::ucrt-10.0.22621.0-h57928b3_0
  vc                 conda-forge/win-64::vc-14.3-hcf57466_18
  vc14_runtime       conda-forge/win-64::vc14_runtime-14.38.33130-h82b7239_18
  vs2015_runtime     conda-forge/win-64::vs2015_runtime-14.38.33130-hcb4865c_18
  wheel              conda-forge/noarch::wheel-0.42.0-pyhd8ed1ab_0
  xz                 conda-forge/win-64::xz-5.2.6-h8d14728_0

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate test
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>conda activate test

(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>pip install pandas==2.0.3
Collecting pandas==2.0.3
  Downloading pandas-2.0.3-cp311-cp311-win_amd64.whl.metadata (18 kB)
Collecting python-dateutil>=2.8.2 (from pandas==2.0.3)
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1 (from pandas==2.0.3)
  Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas==2.0.3)
  Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy>=1.21.0 (from pandas==2.0.3)
  Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 61.0/61.0 kB 819.4 kB/s eta 0:00:00
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas==2.0.3)
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Downloading pandas-2.0.3-cp311-cp311-win_amd64.whl (10.6 MB)
   ---------------------------------------- 10.6/10.6 MB 10.7 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
   ---------------------------------------- 15.8/15.8 MB 9.6 MB/s eta 0:00:00
Using cached pytz-2024.1-py2.py3-none-any.whl (505 kB)
Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)
Installing collected packages: pytz, tzdata, six, numpy, python-dateutil, pandas
Successfully installed numpy-1.26.4 pandas-2.0.3 python-dateutil-2.8.2 pytz-2024.1 six-1.16.0 tzdata-2024.1

(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>python make_table.py -i results -t folder
making table for APCI
Traceback (most recent call last):
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 115, in <module>
    main()
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 104, in main
    table = make_table(data, df_results)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master\make_table.py", line 57, in make_table
    table.loc[:,key] = column_list
    ~~~~~~~~~^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\indexing.py", line 849, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\indexing.py", line 1716, in _setitem_with_indexer
    take_split_path = not can_hold_element(
                          ^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 1744, in can_hold_element
    np_can_hold_element(dtype, element)
  File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 1770, in np_can_hold_element
    tipo = _maybe_infer_dtype_type(element)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Magnus Palmblad\anaconda3\envs\test\Lib\site-packages\pandas\core\dtypes\cast.py", line 976, in _maybe_infer_dtype_type
    element = np.asarray(element)
              ^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3492,) + inhomogeneous part.

(test) C:\Users\Magnus Palmblad\Downloads\SCOPE-master_new\SCOPE-master>
ReinV commented 8 months ago

Can you add a line to the make_table script where you print the key and the column list, and report the ones that make it fail?

magnuspalmblad commented 8 months ago

I can read and print the entire column_list, and it looks right. It is the assignment table.loc[:,key] = column_list that fails. And it also fails also very short column_lists, with only two ChEBIs. It does not seem to matter which ones.

magnuspalmblad commented 8 months ago

I can add that I also tried three different computers (my personal laptop, LUMC laptop and LUMC workstation), with different Python environments. As soon as I update these (or download Python/Pandas again), I get the error above. Python complains about column_list not containing scalar values (which it should, as far as I understand how it is constructed). Converting all the elements to strings did solve the problem, but they should have been strings all along, right? Anyway, many thanks for looking into this!