UT-CHG / BET

Python package for data-consistent stochastic inverse and forward problems.
http://ut-chg.github.io/BET
Other
11 stars 21 forks source link

why no pickle? #360

Closed mathematicalmichael closed 4 years ago

mathematicalmichael commented 4 years ago

Minimal working example:

import pickle
import bet.sample as sample
import os

s_set = sample.sample_set(2)
base_path = '/home/jovyan/work/'
model_path = os.path.join(base_path, 'BET')
with open(os.path.join(model_path, 'model.pkl'), 'wb') as out:
    pickle.dump(s_set, out)
    print('dumped to %s'%out)

with open(os.path.join(model_path, 'model.pkl'), 'rb') as inp:
    s_set_load = pickle.load(inp)
    print('loaded %s'%inp)

print(s_set, s_set_load)      

Is there any problem with saving/loading like this?

smattis commented 4 years ago

I agree that pickle would be more efficient and should move towards that for version 3.

mathematicalmichael commented 4 years ago

@smattis do you think you could handle the move to pickle or should I attempt it?

smattis commented 4 years ago

I probably have more time to commit to it.

mathematicalmichael commented 4 years ago

I might have just done this as part of addressing SciPy upgrades and handling adaptive sampling... stay tuned.

mathematicalmichael commented 4 years ago

Yes. I think I figured it out. sio.savemat changed it's behavior to mimic that of pickle (a safer approach, closing out file buffers instead of keeping them open). What breaks in adaptiveSampling from upgrading scipy is exactly the same error that starts to get thrown once the change to pickle is implemented.

meaning, pickle + old scipy = same problem as no pickle + new scipy.

I'll put in thirty minutes trying to resolve it, but if I can't, I'm going to throw a "skip test in parallel" bandaid for the time being so I can unblock myself from testing SciPy upgrades with the new sampling approach.

mathematicalmichael commented 4 years ago

confirmed: everything in the library works except for adaptive sampling when I upgrade spicy. Simply skipping in parallel for now. It's actually just the hotstart=1 (incomplete run) that fails, so I'm simply removing that one from the test.

mathematicalmichael commented 4 years ago

well, I got it to work without skipping. I don't have confidence in the hot start from partial run, the tests kind of looks incomplete for it, but ... whatever. better than skipping it altogether.

mathematicalmichael commented 4 years ago

update: I am now successfully running code from 2016-2017 using the new BET (with pickling). Steps:

  1. Run 2to3 -w *.py in examples directory (I'm on @mpilosov 's fork of BET, on some dev branches corresponding to different experiments). Optionally autopep8 -i *.py for styling.

  2. Get through a couple undocumented dependencies (dolfin, pyprind) of scripts that aren't part of BET. conda/pip actually made this seamless.

  3. My scripts had hard-coded .mat extensions in a couple of locations, that was literally the only thing I had to swap out. sed would work but I only had five of them (found with grep) so I just changed them manually. Removed the extensions, because the dev branch of modern BET that I have installed doesn't add .pkl, it just leaves off extensions, to play nicer with backwards compatibility (since we supported a lack of extensions before anyway... they mean nothing in the end, so I support this choice).

So, that's great! My scripts create a bunch of directories (in each of these are the individual savefiles for each trial.

.
├── checkskew_and_plot.py
├── estimatedist_build_pointers.py
├── estimatedist_compute_metrics.py
├── estimatedist_funs.py
├── estimatedist_plotdata.py
├── estimatedist.py
├── estimatedist_setup.py
├── file_struct.txt
├── __pycache__
│   ├── estimatedist_funs.cpython-37.pyc
│   └── estimatedist_setup.cpython-37.pyc
└── results_heatrod_3
    ├── est_discretizations
    │   ├── rand_N_1280
    │   ├── rand_N_160
    │   ├── rand_N_20
    │   ├── rand_N_2560
    │   ├── rand_N_320
    │   ├── rand_N_40
    │   ├── rand_N_640
    │   └── rand_N_80
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_0.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_10.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_11.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_12.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_13.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_14.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_15.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_16.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_17.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_18.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_19.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_1.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_20.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_21.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_22.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_23.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_24.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_2.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_3.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_4.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_5.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_6.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_7.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_8.npy
    │       ├── ptr_from_rand_I_100000_to_rand_N_80_trial_9.npy
    │       ├── rand_N_80_trial_0
    │       ├── rand_N_80_trial_1
    │       ├── rand_N_80_trial_10
    │       ├── rand_N_80_trial_11
    │       ├── rand_N_80_trial_12
    │       ├── rand_N_80_trial_13
    │       ├── rand_N_80_trial_14
    │       ├── rand_N_80_trial_15
    │       ├── rand_N_80_trial_16
    │       ├── rand_N_80_trial_17
    │       ├── rand_N_80_trial_18
    │       ├── rand_N_80_trial_19
    │       ├── rand_N_80_trial_2
    │       ├── rand_N_80_trial_20
    │       ├── rand_N_80_trial_21
    │       ├── rand_N_80_trial_22
    │       ├── rand_N_80_trial_23
    │       ├── rand_N_80_trial_24
    │       ├── rand_N_80_trial_3
    │       ├── rand_N_80_trial_4
    │       ├── rand_N_80_trial_5
    │       ├── rand_N_80_trial_6
    │       ├── rand_N_80_trial_7
    │       ├── rand_N_80_trial_8
    │       └── rand_N_80_trial_9
    ├── est_solutions
    │   ├── QoI_choice_1
    │   │   ├── reg_M_1
    │   │   ├── reg_M_4
    │   │   └── reg_M_9
    │   └── QoI_choice_2
    │       ├── reg_M_1
    │       ├── reg_M_4
    │       └── reg_M_9
    ├── integration_sets
    │   └── rand_IntSet_100000
    ├── postprocess_rectscale_6
    │   └── reg_BigN_40000
    │       └── reg_M_1
    └── ref_solutions
        ├── Partition_Disc-reg_M_1
        ├── Partition_Disc-reg_M_4
        ├── Partition_Disc-reg_M_9
        └── reg_BigN_40000
            ├── ptr_from_rand_I_100000_to_reg_BigN_40000.npy
            ├── Reference_Disc-reg_BigN_40000
            ├── reg_M_1
            ├── reg_M_4
            └── reg_M_9

29 directories, 357 files

This allows for swapping out metrics and choices of observed distribution to create convergence plots really quickly.

I'll close out this issue once I get the develop branch fully validated. These simulations running with so few changes is a good validation test.

I'll create a separate repo to house the example above.

mathematicalmichael commented 4 years ago

(PS that example takes up 1/3 GB).

mathematicalmichael commented 4 years ago

pickle default in v3