Closed pgkirsch closed 4 years ago
without looking into changing just what is pickled, using cpickle and bz2 takes it from 12 to 3.6 MB:
class SolutionArray...
def save_compressed(self, title="solution", **cpickleargs):
"Pickle a file and then compress it into a file with extension."
with bz2.BZ2File(title + ".pbz2", "w") as f:
cPickle.dump(self, f, **cpickleargs)
@staticmethod
def decompress_file(file):
"Load any compressed pickle file"
return cPickle.load(bz2.BZ2File(file, "rb"))
@pgkirsch check out https://github.com/convexengineering/gpkit/pull/1498; what's that bring the 110MB file down to?
There's obviously a lot more that can be done by changing what is pickled as well...
27 MB!
Definitely a performance trade off, as I'm sure you're aware. In case you're curious: compressing takes ~20 seconds and decompressing takes ~32 seconds (coincidentally the original solution time takes 52 seconds!). With regular pickle it takes 7 seconds to save and 12 seconds to load.
What kinds of things could be selectively left out of the pickle?
I think there's something about the way it's including all the constraints that's taking more space than it should. I haven't found any debugging tools which show what's taking up space in a pickle file, so it'll be trial and error figuring out what that might be...
(note: down to 1.9MB, much faster loads in lastest commit on https://github.com/convexengineering/gpkit/pull/1498)
Looking at where the bulk is coming from, 92% is in sol["sensitivities"]["constraints"]
, and another 2% in sol["sensitivities"]["variables"]
, both of which are mostly near-zeroes. Only storing constraints with |senss| >= 0.01
results in a file one-quarter the original size.
Further work should be done to determine just why these constraint objects are so large.
@pgkirsch between the branches merged above, pickle size should be down to about twelfth of what it was before. It can probably be reduced by another factor of four by cutting insensitive constraints, but I'm a little more hesitant to do that by default.
This is great, thanks so much @bqpd!
@bqpd 107 MB --> 4 MB!
nice!!
SolutionArray.save()
is a wonderful feature but it can produce some seriously big files. I recently solved a 42000 free variable model and the pickle file was 110 MB! Even the models I solve on a more frequent basis yield 10-20 MB pickle files. In case anyone is curious the simple text file representation of the solution (output ofSolutionArray.savetxt()
is ~700 kB.I previously spoke to @bqpd about this and he mentioned that it should be possible to make these files smaller. If it is a trade-off between how much original model data/functionality is preserved and file size, it would be great if there was an argument to specify tiers of data preservation.