etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
781 stars 212 forks source link

Lift `sqlite3`'s `check_same_thread` flag to `NCBITaxa.__init__()` #749

Closed rtviii closed 3 months ago

rtviii commented 4 months ago

Using this object in a multithreaded environment ( ex. concurrent.futures) results in the following error: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 131662279252864 and this is thread id 131661501028032.

Perhaps this is to be expected of sqlite, but given that the taxonomy db itself is readonly ( i assume?) after creation, i think this flag should be exposed.

rtviii commented 3 months ago

I'm guessing i can use a processpool instead for my limited purposes. Closing.

I have a bunch of tasks, each of which heavily uses NCBITaxa(dbfile=NCBI_TAXA_SQLITE); ncbi.get_topology(...) so it would be nice to share the contents of the sqlite after all. Looking at it at the moment its around 600MB so i'm wondering if anyone tried to mmap or it or whatever you do with sqlite... /dev/shm/ s.t. for the duration of the taxonomy-related computation a bioinformaticist can just have the thing in memory. Curious to be corrected or hear thoughts. Thanks for the package.