Closed mathematicalmichael closed 5 years ago
However, while that does trigger the conditional statement, the parallel loading fails because the number of processors no longer matches the number of items found by glob
.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-e685ff89c9b6> in <module>
----> 1 s_set = sample.load_sample_set('test_save')
~/work/BET/bet/sample.py in load_sample_set(file_name, sample_set_name, localize)
140 os.path.dirname(file_name), "proc0_{}".format(
141 os.path.basename(file_name)))):
--> 142 return load_sample_set_parallel(file_name, sample_set_name)
143
144 mdat = sio.loadmat(file_name)
~/work/BET/bet/sample.py in load_sample_set_parallel(file_name, sample_set_name)
212 # otherwise gather the data from mdat and then scatter
213 # among the processors and update mdat
--> 214 mdat_files_local = comm.scatter(mdat_files)
215 mdat_local = [sio.loadmat(m) for m in mdat_files_local]
216 mdat_list = comm.allgather(mdat_local)
mpi4py/MPI/Comm.pyx in mpi4py.MPI.Comm.scatter()
mpi4py/MPI/msgpickle.pxi in mpi4py.MPI.PyMPI_scatter()
mpi4py/MPI/msgpickle.pxi in mpi4py.MPI.Pickle.dumpv()
ValueError: expecting 1 items, got 2
I believe the expected behavior (correct me if I'm wrong) is that if I send off work to multiple processors and save at the end, it saves the chunks, then if I load them back in serial, rather than manually stitching the sample sets back together with merge
, the conditional statement that is now being triggered should return a fully "merged" sample set that it got from loading from all the processors.
right now it's trying to scatter 2 objects to a single processor and getting a warning back.
I will use try/except ValueError
to bypass it... fingers crossed.
and now that I've managed to sneak around the errors, I'm realizing that load_sample_set_parallel
returns nothing. I think once I fix that we'll be good to go.
changes made: append
instead of extend
the list, return loaded_sample_set
now load_sample_set
in serial will go grab the appropriate parallel files and merge them.
I have only thus far checked for correct merging of _values
and will ensure all other attributes are merged correctly before submitting a PR (frankly it might just make it in as part of another PR I'm planning).
How can we write tests to ensure this behavior is consistent/correct?
Run the top in parallel, bottom in serial.
import bet.sample as sample
import numpy as np
sset = sample.sample_set(2)
sset.set_distribution()
# np.random.seed(0)
sset.generate_samples(12)
sset.get_values()
print(sset.get_values_local())
sample.save_sample_set(sset, 'test_save')
s_set = sample.load_sample_set('test_save')
# s_set.local_to_global()
print(s_set.get_values())
vectors seem to load and merge appropriately but other properties do not.
for example,
_dim
becomes array(2)
instead of the original 2
... seems like an easy fix.
then I have to go deal with the same version of this for discretizations as well. There are a few attributes I introduced that are not vectors that will need to be copied over (likely just save and inherit to/from the root worker).
I have to save a scipy.distribution object and I don't think that will quite work with the dictionary saving since it will only take strings/arrays.
I see that new_mdat[sample_set_name + '_sample_set_type'] = str(type(save_set)).split("'")[1]
kind of handles this by just saving the string for re-use later on with loading. I think I'll have to do that with re-instantiating the distribution object.
So far it does not seem like the fix to the proc
typo addressed #333 . Still failing for n>2
TODO: create an 'extract information' function that plays nice with all the various distributions we may have to load/save. have had to re-use it twice already (serial and parallel loading of sample sets). Will need to again for predicted/observed in _setup
, so it's worth making a function for it...
perhaps part of this is also a function that appropriately unpacks dictionary items into new key-entries for the mdat file with format sample_set_name+ "_" + key
, and when unpacked, strips of all unnecessary characters. This gets used for the keys in kwds for distributions, and will also get used for setup.
addressed by https://github.com/UT-CHG/BET/pull/355, closing now
related to #333 and perhaps others.
I think I found the source of the bug... line 140 in
sample.py
:Probably should be
proc0_{}
I'm still mid-investigation, but I wanted to put this down here.