evanberkowitz / two-dimensional-gasses

Let's crush it
0 stars 0 forks source link

Ensemble + Observable interfaces #12

Closed evanberkowitz closed 1 year ago

evanberkowitz commented 2 years ago

It'd be good to construct a somewhat unified interface for ensembles and observables.

evanberkowitz commented 1 year ago
import numpy as np
import functools
import inspect
import types

import logging
logger = logging.getLogger(__name__)
logging.basicConfig(format='%(asctime)s %(name)s %(levelname)10s %(message)s', level=logging.INFO)

The three main design criteria are

  1. Observables should be attributes / methods / etc of GrandCanonical, ensemble.O or .O(), so that the forwarding for canonical sectors makes sense. The other choice is to make observables O(ensemble) but then routing the call based on the type of ensemble received. That seems like a great deal of overhead, however.
  2. Therefore, users should be able to add observables to GrandCanonical at run time using monkey patching, potentially using a decorator; it'd be nice to make them lazily-evaluted properties that need no () when called. Then observables could be added from outside of the tdg internals.
  3. Observables should be memoized / cached depending on reuse
  4. Garbage collection should work (eventually).

2 and 3 conflict because adding a class-level cache can cause long-lived objects to not clear. That reference suggests an instance-level cache, created inside __init__. Then the object reference in the cache is inside the object itself, and is found by the cycle detection in the garbage collector.

To achieve instance-level caching while maintaining goal 1 we can create an GrandCanonical.observable classmethod that stores decorated functions.

class GrandCanonical:

    observables = dict()
    _cache = set()

    def __init__(self, n):
        self.n = n

        for name, func in GrandCanonical.observables.items():
            f = types.MethodType(func, self)

            if name not in GrandCanonical._cache:
                logger.info(f'{name} is an uncached observable.')
                self.__setattr__(name, f)
                continue

            logger.info(f'{name} is a cached observable.')
            self.__setattr__(
                name,
                functools.lru_cache()(f)
            )

    def __del__(self):
        logger.info(f'Bye from {self.n}!')

    @classmethod
    def observable(cls, cache=False):
        r'''
        Parameters
        ----------
            cache: bool
                Should the decorated observable be cached?

        '''
        def decorator(func):
            GrandCanonical.observables[func.__name__] = func
            if cache:
                GrandCanonical._cache.add(func.__name__)
            return func
        return decorator

Then we can start adding observables!

These, in an ideal world, would not need a () to be evaluated, like a @property. I could only make that happen if I evaluated them in the observable loop inside __init__. Unforunately that's not compatible with every scenario; for instance if the ensemble is going to be generated by HMC or .from_configurations.

@GrandCanonical.observable(cache=True)
def sq(ensemble):
    logger.info("evaluating sq")
    return ensemble.n**2

@GrandCanonical.observable(cache=False)
def sqrt(ensemble):
    logger.info("evaluating sqrt")
    return np.sqrt(ensemble.n+0.j)

These take an additional argument and get added correctly!

@GrandCanonical.observable(cache=True)
def mul(ensemble, factor):
    logger.info("evaluating mul")
    return factor * ensemble.n

@GrandCanonical.observable(cache=False)
def div(ensemble, factor):
    logger.info("evaluating dif")
    return ensemble.n / factor

Then

e = GrandCanonical(7)
# 2023-01-25 22:59:25,965 __main__       INFO sq is a cached observable.
# 2023-01-25 22:59:25,966 __main__       INFO sqrt is an uncached observable.
# 2023-01-25 22:59:25,968 __main__       INFO mul is a cached observable.
# 2023-01-25 22:59:25,978 __main__       INFO div is an uncached observable.
print(e.sq())
# 2023-01-25 22:59:26,617 __main__       INFO evaluating sq
# 49
print(e.sqrt())
# 2023-01-25 22:59:26,618 __main__       INFO evaluating sqrt
# (2.6457513110645907+0j)
print(e.mul(17))
# 2023-01-25 22:59:26,621 __main__       INFO evaluating mul
# 119
print(e.div(3.))
# 2023-01-25 22:59:26,622 __main__       INFO evaluating dif
# 2.3333333333333335

Then,

e = None
import gc
gc.collect()
# 2023-01-25 22:59:41,810 __main__       INFO Bye from 7!

Aside from the argument-free attributes still requiring () for evaluation, this seems to meet all design criteria!

HOWEVER.

I began trying to do a canonically-projected calculation to make sure the observables still get inherited by ensemble.Canonical and .Sector objects, and two bad things happened.

First, the construction of all the canonical terms took much MUCH longer than before. Sure, a lot of memory allocation and assignment are happening for each object, rather than relying on the class definition.

Second, the attribute forwarding from the sector and canonical objects to the grandcanonical stopped working. I got all sorts of weird errors, claiming various observables (that I could evaluate on the grand canonical ensemble) weren't there.

This makes me think the right strategy is, for the time being, to continue to program observables directly into GrandCanonical itself, even though it is unsustainable in the long-term, rather than let this design issue hold back progress.

evanberkowitz commented 1 year ago

The observable-decorators branch 2ca21dd5087b8f5c26703b66172af4063d075a9f shows how this works.

evanberkowitz commented 1 year ago

This may be achievable with Descriptors. The python docs describe it.

Can objects inherit from descriptors? Then we could make an Observable which knows how to call its methods in different circumstances (GrandCanonical, Canonical, Binning, Bootstrap even!) and all observables could inherit from that?