effigies / looseversion

A backwards/forwards-compatible fork of distutils.version.LooseVersion
Other
13 stars 1 forks source link

Error while using in a pandas context #10

Closed maaaaz closed 1 year ago

maaaaz commented 1 year ago

Hello,

First, let me thank you for providing this package after all the depreciation and broken things by distutils and packaging.version. Then, I am currently facing an issue using looseversion in a panda context:

>>> df = pd.DataFrame({"version": ['3.00.12-beta', '02.0.1', '1.0b'] })

>>> df
        version
0  3.00.12-beta
1        02.0.1
2          1.0b

>>> df.sort_values(by='version', key=LooseVersion)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pandas/core/frame.py", line 6923, in sort_values
    indexer = nargsort(
              ^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pandas/core/sorting.py", line 408, in nargsort
    items = ensure_key_mapped(items, key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pandas/core/sorting.py", line 566, in ensure_key_mapped
    result = key(values.copy())
             ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/looseversion.py", line 125, in __init__
    if vstring:
  File "/usr/local/lib/python3.11/dist-packages/pandas/core/generic.py", line 1527, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

While it perfectly works in a non-panda context:

>>> version = ['3.00.12-beta', '02.0.1', '1.0b']
>>> sorted(version, key=LooseVersion)
['1.0b', '02.0.1', '3.00.12-beta']

Cheers!

maaaaz commented 1 year ago

cf. https://stackoverflow.com/a/64113422

effigies commented 1 year ago

Is this something you were able to do before with LooseVersion or LegacyVersion?

effigies commented 1 year ago

Ah, can you just try key=np.vectorize(LooseVersion)?

effigies commented 1 year ago

Confirmed:

In [1]: import looseversion as lv

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: df = pd.DataFrame({"version": ['3.00.12-beta', '02.0.1', '1.0b'] })

In [5]: df.sort_values(by='version', key=np.vectorize(lv.LooseVersion))
Out[5]: 
        version
2          1.0b
1        02.0.1
0  3.00.12-beta
maaaaz commented 1 year ago

Thanks, it seems to work

>>> df = pd.DataFrame({"version": ['3.00.12-beta', '02.0.1', '1.0b'] })
>>> df.sort_values(by='version', key=np.vectorize(LooseVersion))
        version
2          1.0b
1        02.0.1
0  3.00.12-beta

But I have to admit that needing the not-small "numpy" package is a major drawback: any plan to support without the need of numpy ?

effigies commented 1 year ago

Numpy is a pandas dependency. If you're using pandas, numpy is already imported.

maaaaz commented 1 year ago

Ok!