bbayles / python-pure-cdb

Pure Python CDB reader/writer
https://python-pure-cdb.readthedocs.io/
MIT License
40 stars 10 forks source link

Great Package! Can it cross the 4GB limitation? #4

Closed ozomer closed 8 years ago

ozomer commented 8 years ago

We are now using python-pure-cdb on our server (with minor modifications) on Google App Engine with a ~3GB file on Google Storage. We have implemented a "FileString" class which is a lazy "mmap" that reads only the required parts from the Google Storage file when getitem is called (rounding to 8kb blocks).

We want to move to larger files - even if it breaks the original cdb implementation. We want to use 64bit pointers instead of 32bit. I know we should replace the read_2_le4\write_2_le4 Struct from '<LL' to '<QQ', use a hash-function that gives 64-bit values and multiply all the hard-coded offsets (8, 2048, others?) by two. There are also the bit-shifts which confuse me...

dw commented 8 years ago

Hi Oren,

Apologies for the severe delay in replying. The bit shifts are simply multiplication and division by powers of 2, really they do not need to be written that way, it is simply a holdover from the original CDB code. For example, "1 << 3" multiplies 1 by 2-to-the-power-of-3, aka. 8, and so the result is 8.

dw commented 8 years ago

It's a little late to be replying to your bug, but just in case, here is a complete implementation. All the best