Open martinspetlik opened 4 years ago
Which 'format' do you mean? And which method of the storage API?
There is a method to specify the structure of the data and then the single sample is just a flat vector of values. We can only check that it has the correct length.
Consistent flatting and inflating of the samples has yet to be done. Packing has to be done in the simulation. Inflating in estimation.
sample_storage.save_global_data(self, result_format: List[QuantitySpec], step_range=None)
called from sampler.__init__
We can check the length. But we are not able to find out if user run mlmc with same simulation through all the runs which use single hdf file.
Test case:
run mlmc with modified simulation in the meantime -> It might failed (different length of result) or results might be inconsistant.
We currently depend on user in terms of result_format consistency. It might be ok and I think we agreed on that. Nevertheless there is a room for improvement.
Sure, simulation inconsistency between restarted sample collection is an issue. But we can live with that right now.
Few notes:
Theoretically we can compute a hash of the simulation compute method, but it may depend on other modules. I see no way to have a check that is robust against all changes in the simulation.
If we are unable to track the changes in the simulation, we can possibly detect such changes from the data if we mark samples by a 'run signature' e.g. by the time of the start of the main script. Then it could be possible to apply statistical tests for the same mean values between runs.
Am I supposed to check samples data format while saving to storage?