DanielStutzbach / blist

A list-like type with better asymptotic performance and similar performance on small lists
Other
310 stars 36 forks source link

Segmentation Fault #33

Open abhik opened 13 years ago

abhik commented 13 years ago

Trying to use a blist of size 4**20 and I get segmentation fault upon first read/write to blist. Using latest version from git. Running on Mac OSX on x86_64 with latest Xcode/gcc install. Commands to replicate error:

[abhik:~/Downloads]$ git clone https://github.com/DanielStutzbach/blist.git Cloning into blist... remote: Counting objects: 1169, done. remote: Compressing objects: 100% (523/523), done. remote: Total 1169 (delta 673), reused 1111 (delta 639) Receiving objects: 100% (1169/1169), 400.80 KiB | 502 KiB/s, done. Resolving deltas: 100% (673/673), done. [abhik:~/Downloads/blist]$ python setup.py build_ext --inplace Downloading http://pypi.python.org/packages/source/d/distribute/distribute-0.6.12.tar.gz Extracting in /var/folders/Tv/Tv70U9x4EJ8RVIsTgph+fE+++TI/-Tmp-/tmprwLwFu Now working in /var/folders/Tv/Tv70U9x4EJ8RVIsTgph+fE+++TI/-Tmp-/tmprwLwFu/distribute-0.6.12 Building a Distribute egg in /Users/abhik/Downloads/blist /Users/abhik/Downloads/blist/distribute-0.6.12-py2.7.egg running build_ext building '_blist' extension creating build creating build/temp.macosx-10.6-x86_64-2.7 /usr/bin/gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -pipe -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DBLIST_FLOAT_RADIX_SORT=1 -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _blist.c -o build/temp.macosx-10.6-x86_64-2.7/_blist.o creating build/lib.macosx-10.6-x86_64-2.7 /usr/bin/gcc-4.2 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/opt/local/lib build/temp.macosx-10.6-x86_64-2.7/_blist.o -o build/lib.macosx-10.6-x86_64-2.7/_blist.so copying build/lib.macosx-10.6-x86_64-2.7/_blist.so -> [abhik:~/Downloads/blist]$ python
Python 2.7.1 (r271:86832, Mar 8 2011, 16:24:29) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import blist x = blist.blist([0]) * 4**20 x[10] Segmentation fault

Thanks!

DanielStutzbach commented 13 years ago

Ouch, that's pretty nasty. I'm taking a look now.

DanielStutzbach commented 13 years ago

Okay, I see the problem. Although blist is very memory efficient for sparse lists, it still uses O(n) memory. Some of the code assumes the list won't be larger than approximately 2**32, leading to a crash.

At minimum, it should raise a MemoryError or something like that rather than causing a segmentation fault.

In theory, blist could be altered to give up O(1) lookups in exchange for using only O(log n) memory for sparse lists. However, the implementation is non-trivial (the hard part is tracking when a list is "sparse").

Do you have a use case for making a list with 4**20 elements or were you (understandably) just playing around? :-)

abhik commented 13 years ago

Ah, I didn't realize the O(n) memory requirement. I think I misunderstood the 'sparse' in blist's description. I thought it worked somewhat like a hash and that only indexes where I set a value would be actually stored.

I do actually have a use case. I need to store (and lookup) integer values corresponding to 3 billion 20-character long DNA sequences. The python dict was too slow and memory intensive so I tried blist. I wanted to convert each DNA sequence to an integer between 0 and 4**20 (4 because alphabet is [A,C,G,T]) and then set its index in the blist to the corresponding integer value. Looks like time to pull out the rusty gcc.. :)

iandennismiller commented 8 years ago

I have a similar use case, actually. The example provided by abhik still causes a segfault. By any chance, has the situation changed in the last several years to make this problem simpler to solve?