freejstone / CONGA

Implementation of the CONGA algorithm for a combined open and narrow search between experimental spectra and a peptide database.
MIT License
3 stars 1 forks source link

Error processing comet data: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #10

Closed mriffle closed 1 year ago

mriffle commented 1 year ago

When running on comet results I generated I get the following error:

CPU: Linux-5.4.0-126-generic-x86_64-with-glibc2.31
2023-01-12 23:05:00.225097
Command used: /usr/local/bin/CONGA.py --score e-value --dcy_prefix DECOY_ narrow.comet.results.txt wide.comet.results.txt
Successfully read in arguments.
Reading in search files.
Successfully read in search files.
e-value successfully found in search files.
Comet search files detected.
Concatenated search files detected.
Creating original_target_sequence column as in tide-search.
Rewriting peptide sequence in tide-search format.
Filtering for neighbours.
100%|███████████████████████████████████████████████████████████████████████████| 62905/62905 [00:45<00:00, 1384.42it/s]
Doing head to head competition.
Constructing groups adaptively.
                                     decoys  targets     ratio
group names:
narrow                                 5525    27253  0.202730
top 1 PSMs & top (14, 16] mass bins     116      258  0.449612
top 1 PSMs & top (16, 18] mass bins      33      162  0.203704
top 1 PSMs & top 1 mass bin              66     1088  0.060662
top 1 PSMs & top 2 mass bin              88      669  0.131540
top 1 PSMs & top 7 mass bin               9      193  0.046632
top 2 or more PSMs                      213      908  0.234581
left over group                        6278     8251  0.760877
Applying group walk.
Traceback (most recent call last):
  File "/usr/local/bin/CONGA.py", line 2249, in <module>
    main()
    ^^^^^^
  File "/usr/local/bin/CONGA.py", line 1971, in main
    results = group_walk(list(df['winning_scores']), list(df['labels']), list(df['all_group_ids']), K, return_frontier, correction)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/bin/CONGA.py", line 487, in group_walk
    randind = random.choice(inds)
              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/random.py", line 369, in choice
    if not seq:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I am running comet 2021.02rev0. For narrow search I used a 10ppm peptide_mass_tolerance. For wide I did a 500 amu peptide_mass_tolerance. See attached files for the data I'm processing.

wide.comet.results.txt.gz narrow.comet.results.txt.gz conga.log.txt.gz

freejstone commented 1 year ago

Thanks Mike. That is strange, I don't get the same error. Could you possibly try setting a seed value in the options, say --seed 1? Cheers!

INFO: CPU: macOS-10.16-x86_64-i386-64bit
INFO: 2023-01-13 12:52:34.171161
INFO: Command used: CONGA.py --dcy_prefix DECOY_ --score e-value docs/pages/files/comet/narrow.comet.results.txt docs/pages/files/comet/wide.comet.results.txt
INFO: Successfully read in arguments
INFO: Reading in search files.
INFO: Successfully read in search files.
INFO: e-value successfully found in search files. 

INFO: Comet search files detected.
INFO: Concatenated search files detected.
INFO: Creating original_target_sequence column as in tide-search.
INFO: Rewriting peptide sequence in tide-search format.
INFO: Filtering for neighbours.
INFO: Doing head to head competition.
INFO: Constructing groups adaptively.
INFO:                                      decoys  targets     ratio
group names:                                                  
narrow                                 5525    27253  0.202730
top 1 PSMs & top (21, 24] mass bins     160      301  0.531561
top 1 PSMs & top 1 mass bin              66     1088  0.060662
top 1 PSMs & top 2 mass bin             137      898  0.152561
top 1 PSMs & top 7 mass bin               9      193  0.046632
top 2 or more PSMs                      254      945  0.268783
left over group                        6218     8141  0.763788
INFO: Applying group walk.
INFO: Group walk complete.
INFO: 24566 peptides discovered at the 1% FDR level.
INFO: 26580 peptides discovered at the 5% FDR level.
INFO: Scan multiplicities among the discovered peptides at 1% FDR level:
INFO:                     Count
Scan multiplicity:       
1                   23610
2                     475
3                       2
INFO: Scan multiplicities among the discovered peptides at 5% FDR level:
INFO:                     Count
Scan multiplicity:       
1                   25263
2                     654
3                       3
INFO: Writing peptides at user-specified FDR level to directory.
INFO: Elapsed time: 74.58 s
mriffle commented 1 year ago

Same error with --seed 1. What version of python are you using, I can try that.

freejstone commented 1 year ago

Thanks, I am currently using Python 3.9.12.

mriffle commented 1 year ago

OK, Python 3.9 worked!

3.11 is the latest version, I think. Might be work investigating why it doesn't work there.

The Dockerfile let me quickly switch to 3.9, so that's good. I'll update the Dockerfile for specify 3.9

freejstone commented 1 year ago

Great, thanks Mike! I think it's best that I update the code for compatibility with 3.11. Will let you know when this is done.

freejstone commented 1 year ago

I think I have found the reasons for failure in 3.11. Seems like random.choice() no longer support numpy arrays.

https://github.com/python/cpython/issues/100805

Following my recent meeting with Bill/Uri, I have updated code/documentation so that we can now use conda to support the correct virtual environment (so that it is consistent with Bill's lab) and I have specified the correct Python version in my setup.py. The updated code/documentation is on the develop-package branch which will I will merge over onto the main branch at some stage soon.