ekzhu / datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
https://ekzhu.github.io/datasketch
MIT License
2.51k stars 294 forks source link

Cassandra storage not compatible with Python 3.12 #236

Open ekzhu opened 6 months ago

ekzhu commented 6 months ago

When running in Python 3.12, get the following error:

datasketch/__init__.py:4: in <module>
    from datasketch.lsh import MinHashLSH
datasketch/lsh.py:7: in <module>
    from datasketch.storage import ordered_storage, unordered_storage, _random_name
datasketch/storage.py:17: in <module>
    from cassandra import cluster as c_cluster
cassandra/cluster.py:173: in init cassandra.cluster
    ???
E   cassandra.DependencyException: Unable to load a default connection class
E   The following exceptions were observed: 
E    - The C extension needed to use libev was not found.  This probably means that you didn't have the required build dependencies when installing the driver.  See http://datastax.github.io/python-driver/installation.html#c-extensions for instructions on installing build dependencies and building the C extension.
E    - Unable to import asyncore module.  Note that this module has been removed in Python 3.12 so when using the driver with this version (or anything newer) you will need to use one of the other event loop implementations.
ekzhu commented 6 months ago

@ostefano Could you help with this issue?

rupeshkumaar commented 6 months ago

@ekzhu I have taken a look at the issue and it seems that Datastax already raised this. Sharing the ticket cassandra-driver for Python 3.12 Linux is compiled without libev support

ekzhu commented 6 months ago
  • Unable to import asyncore module. Note that this module has been removed in Python 3.12 so when using the driver with this version (or anything newer) you will need to use one of the other event loop implementations.

Thanks for looking into this. Regarding the second point in the error message:

E - Unable to import asyncore module. Note that this module has been removed in Python 3.12 so when using the driver with this version (or anything newer) you will need to use one of the other event loop implementations.

Is there anything we have to do on our end?

rupeshkumaar commented 6 months ago

We might have to use asyncio since asyncore has been deprecated. But even if we implement it I think it will be a short time thing because once the issue is resolved and the new drivers are released we might have to reinstall it and it might create an issue again. We can still think about it. @ekzhu

ekzhu commented 6 months ago

Thanks. Let's wait for their issue to resolve.