bittremieux / falcon

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.
BSD 3-Clause "New" or "Revised" License
24 stars 7 forks source link

Error thrown when no cluster was found for at least one charge #5

Closed KilianMaes closed 3 years ago

KilianMaes commented 3 years ago

Issue Description

When clustering a small dataset and no cluster is found for at least one charge (e.g. for charge +2), an error is thrown :

Traceback (most recent call last):
  File "falcon.py", line 255, in <module>
    main()
  File "falcon.py", line 113, in main
    current_label = np.amax(clusters[mask_no_noise]) + 1
  File "<__array_function__ internals>", line 5, in amax
  File "/home/maesk/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2733, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/maesk/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

This is because clusters[mask_no_noise] is an empty array, which is not supported by the function np.amax().

Solution

When no cluster is found for a specific charge, the current label should stay the same (see the PR), there is no need to call np.amax().

@bittremieux could you check if that seems correct? Thank you!

EDIT : I used an environment built from your environment.yml. If you need an .mgf file in order to reproduce the error, I can upload it.

bittremieux commented 3 years ago

Thank you for the detailed bug report and the proposed fix. Merging it now. 🙂