convexengineering / gpkit

Geometric programming for engineers
http://gpkit.readthedocs.org
MIT License
206 stars 40 forks source link

Pickled solutions are very large #1497

Closed pgkirsch closed 4 years ago

pgkirsch commented 4 years ago

SolutionArray.save() is a wonderful feature but it can produce some seriously big files. I recently solved a 42000 free variable model and the pickle file was 110 MB! Even the models I solve on a more frequent basis yield 10-20 MB pickle files. In case anyone is curious the simple text file representation of the solution (output of SolutionArray.savetxt() is ~700 kB.

I previously spoke to @bqpd about this and he mentioned that it should be possible to make these files smaller. If it is a trade-off between how much original model data/functionality is preserved and file size, it would be great if there was an argument to specify tiers of data preservation.

bqpd commented 4 years ago

without looking into changing just what is pickled, using cpickle and bz2 takes it from 12 to 3.6 MB:

class SolutionArray...

    def save_compressed(self, title="solution", **cpickleargs):
        "Pickle a file and then compress it into a file with extension."
        with bz2.BZ2File(title + ".pbz2", "w") as f:
            cPickle.dump(self, f, **cpickleargs)

    @staticmethod
    def decompress_file(file):
        "Load any compressed pickle file"
        return cPickle.load(bz2.BZ2File(file, "rb"))
bqpd commented 4 years ago

@pgkirsch check out https://github.com/convexengineering/gpkit/pull/1498; what's that bring the 110MB file down to?

There's obviously a lot more that can be done by changing what is pickled as well...

pgkirsch commented 4 years ago

27 MB!

pgkirsch commented 4 years ago

Definitely a performance trade off, as I'm sure you're aware. In case you're curious: compressing takes ~20 seconds and decompressing takes ~32 seconds (coincidentally the original solution time takes 52 seconds!). With regular pickle it takes 7 seconds to save and 12 seconds to load.

What kinds of things could be selectively left out of the pickle?

bqpd commented 4 years ago

I think there's something about the way it's including all the constraints that's taking more space than it should. I haven't found any debugging tools which show what's taking up space in a pickle file, so it'll be trial and error figuring out what that might be...

bqpd commented 4 years ago

(note: down to 1.9MB, much faster loads in lastest commit on https://github.com/convexengineering/gpkit/pull/1498)

bqpd commented 4 years ago

Looking at where the bulk is coming from, 92% is in sol["sensitivities"]["constraints"], and another 2% in sol["sensitivities"]["variables"], both of which are mostly near-zeroes. Only storing constraints with |senss| >= 0.01 results in a file one-quarter the original size.

Further work should be done to determine just why these constraint objects are so large.

bqpd commented 4 years ago

@pgkirsch between the branches merged above, pickle size should be down to about twelfth of what it was before. It can probably be reduced by another factor of four by cutting insensitive constraints, but I'm a little more hesitant to do that by default.

pgkirsch commented 4 years ago

This is great, thanks so much @bqpd!

pgkirsch commented 4 years ago

@bqpd 107 MB --> 4 MB!

bqpd commented 4 years ago

nice!!