PengNi / basemods_spark

0 stars 0 forks source link

zlib error when spark dump files larger than 2GB in script "cmph5_opreations" #5

Closed PengNi closed 6 years ago

PengNi commented 6 years ago

tested python version: 2.7.5

Spark/Python will raise an error when dumping files larger than 2GB:

OverflowError: size does not fit in an int.

This occurs when using zlib.

Related Works: joblib issue #122 joblib issue #300 python issue #23306 python issue #27130

PengNi commented 6 years ago

For Python 2.x, this bug has been fixed on Python 2.7.13 (or higher) (Release Notes).

PengNi commented 6 years ago

The OverflowError problem was solved after upgrading python to 2.7.13.

close the issue.

PengNi commented 6 years ago

Also, Spark has another limitation that the obj to dump can't be larger than 2GB when using struct.pack (see issue #6 ).

So the right thing to do is that to avoid a single file to be larger than 2GB. And this has nothing to do with zlib's bug.