Confusing caching behaviour of format class instances

cctbx / dxtbx

Diffraction Experiment Toolbox

BSD 3-Clause "New" or "Revised" License

2 stars 18 forks source link

Currently we have two way of obtaining a format class instance:

By directly calling the format class constructor, i.e. format_instance = FormatFooBar(filename). The resulting format class instance is not cached at this point.
By calling the format class get_instance() classmethod, i.e. format_instance = FormatFooBar.get_instance(filename). If the format class has seen this filename before (but only via a previous call to get_instance()) then the cached format class instance will be used, otherwise a new format class instance will be constructed, and the resulting instance cached.

I assert this behaviour is confusing and suboptimal, and even potentially causing subtle bugs (see e.g. #289).

Totally agree that the behaviour is not useful.

Looking at your comment from #289:

Explicitly reset the cache before calling cls.get_instance() to force the format class to be re-instantiated if the process has seen the current file previously. This is necessary in order to support live per-image analysis on HDF5/SWMR files.

This sounds to me like get_instance() is misbehaving and shouldn't be used at all - deprecated and removed - and instead we should be looking at instances where we construct format classes needlessly and aim to eliminate those.

I notice that the get_instance() cache logic goes back to 2017 (afe458e88acefd223b07dc93989b70ce159575ba), and I'm wondering if there ever was a good rationale for introducing it, or whether it was used as premature optimisation to gloss over shortcomings elsewhere in the architecture.

cctbx / dxtbx

Confusing caching behaviour of format class instances #290