cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 18 forks source link

Confusing caching behaviour of format class instances #290

Open rjgildea opened 3 years ago

rjgildea commented 3 years ago

Currently we have two way of obtaining a format class instance:

  1. By directly calling the format class constructor, i.e. format_instance = FormatFooBar(filename). The resulting format class instance is not cached at this point.

  2. By calling the format class get_instance() classmethod, i.e. format_instance = FormatFooBar.get_instance(filename). If the format class has seen this filename before (but only via a previous call to get_instance()) then the cached format class instance will be used, otherwise a new format class instance will be constructed, and the resulting instance cached.

I assert this behaviour is confusing and suboptimal, and even potentially causing subtle bugs (see e.g. #289).

Anthchirp commented 3 years ago

Totally agree that the behaviour is not useful.

Looking at your comment from #289:

Explicitly reset the cache before calling cls.get_instance() to force the format class to be re-instantiated if the process has seen the current file previously. This is necessary in order to support live per-image analysis on HDF5/SWMR files.

This sounds to me like get_instance() is misbehaving and shouldn't be used at all - deprecated and removed - and instead we should be looking at instances where we construct format classes needlessly and aim to eliminate those.

I notice that the get_instance() cache logic goes back to 2017 (afe458e88acefd223b07dc93989b70ce159575ba), and I'm wondering if there ever was a good rationale for introducing it, or whether it was used as premature optimisation to gloss over shortcomings elsewhere in the architecture.