Sample storage format check

GeoMop / MLMC

5 stars 2 forks source link

Sample storage format check #73

Open martinspetlik opened 4 years ago

martinspetlik commented 4 years ago

Am I supposed to check samples data format while saving to storage?

jbrezmorf commented 4 years ago

Which 'format' do you mean? And which method of the storage API?

There is a method to specify the structure of the data and then the single sample is just a flat vector of values. We can only check that it has the correct length.

Consistent flatting and inflating of the samples has yet to be done. Packing has to be done in the simulation. Inflating in estimation.

martinspetlik commented 4 years ago

sample_storage.save_global_data(self, result_format: List[QuantitySpec], step_range=None) called from sampler.__init__

We can check the length. But we are not able to find out if user run mlmc with same simulation through all the runs which use single hdf file.

Test case:

run mlmc with hdf storage and defined simulation
set smaller target variance and keep append to the hdf file
run mlmc with modified simulation in the meantime -> It might failed (different length of result) or results might be inconsistant.

We currently depend on user in terms of result_format consistency. It might be ok and I think we agreed on that. Nevertheless there is a room for improvement.

jbrezmorf commented 4 years ago

Sure, simulation inconsistency between restarted sample collection is an issue. But we can live with that right now.

Few notes:

Theoretically we can compute a hash of the simulation compute method, but it may depend on other modules. I see no way to have a check that is robust against all changes in the simulation.
If we are unable to track the changes in the simulation, we can possibly detect such changes from the data if we mark samples by a 'run signature' e.g. by the time of the start of the main script. Then it could be possible to apply statistical tests for the same mean values between runs.