edbennett / glue_analysis

MIT License
0 stars 1 forks source link

Update consistency checks #24

Closed chillenzer closed 1 year ago

chillenzer commented 1 year ago

Quick question: What shall happen with the consistency checks now that we have basic data validation in place upon freeze()? Some of them are definitely out of sync with what we have defined as our data schema (e.g. they all assume values from range(...)). Others may still add value, particularly concerning consistency between correlators and vevs. As I disallow any modification after freezing, it seems reasonable to actually move an updated set of consistency checks into the freeze() operation (and potentially even into the schema) and drop the @only_on_consistent_data in favour of some @only_on_frozen_data.

edbennett commented 1 year ago

Options as I see them:

  1. Keep things as they are now and throw performance away checking things repeatedly
  2. Move all the logic into freeze and disallow freezing of inconsistent ensembles
  3. Move all the logic into freeze, but keep a separate consistent property to allow inconsistent frozen ensembles

Probably option 2 makes most sense.

chillenzer commented 1 year ago

Also strong proponent of 2. On the others:

  1. "Keep things as they are" is not quite what we'd do. As mentioned above, we would still have to amend them to work with the new conventions. Don't think performance would be too much of an issue. More that it's a data structure that is not supposed to change anymore but you haven't checked it for consistency upon freezing? That's close to gross negligence.
  2. Sounds okay if consistent is not a boolean but an explanation of what's inconsistent if anything. This way the user could use it interactively or in an automated session to react on freeze failures, e.g.,
    >>> correlator_ensemble.freeze()
    ...
    DataConsistencyError: The following aspects are inconsistent about your data:
    ...
    Run <instance>.consistent() to get more details in a human- and machine-readable way.
    >>> correlator_ensemble.consistent()
    {<inconsistency: {'affected': ..., 'more_details': ...}, ...}
    >>> for inconsistency in _:
    ...    correlator_ensemble = cure(inconsistency, correlator_ensemble)
    >>> correlator_ensemble.freeze()
    # works fine
chillenzer commented 1 year ago

Do we want to enforce the following? (All up to duplication due to other columns.)

  1. self.correlators["MC_Time"] == range(self.num_samples) -> My opinion: No, because pyerrors is totally fine with incomplete ensembles and it might make sense to, e.g., reduce autocorrelation, etc.
  2. self.correlators["Time"] == range(self.NT) -> My opinion: Maybe. Contiguous might be a reasonable assumption. Don't think that it needs to start from 0. If you have, for example, long correlators and know already that they are contaminated with excited states for the first $n$ distances, you might want to skip them entirely in the analysis.
edbennett commented 1 year ago
  1. Definitely not. However, we should enforce that there are no duplicates, as pyerrors gives an annoyingly cryptic error when it works out that the smallest separation between MC time indices is zero.
  2. Could be useful, modulo the naming of NT (see #27). But possibly not worth spending effort on at this point, particularly if it'll need to be removed later.
chillenzer commented 1 year ago

Concerning duplicate handling in pyerrors. Should be an issue there, shouldn't it?

edbennett commented 1 year ago

Arguably yes.

Edit: I've raised https://github.com/fjosw/pyerrors/issues/214