AmpersandTV / pymc3-hmm

Hidden Markov models in PyMC3
Other
94 stars 13 forks source link

modeling a wind park availability #84

Closed sdementen closed 3 years ago

sdementen commented 3 years ago

I am interested to detect when a wind park is partially available (for instance, when only 50% of its wind turbines are running, the others being in maintenance).

I have as input two time series:

  1. a yearly hourly sampled approximate windspeed (m/s)
  2. a yearly hourly sampled wind park production (MWh) as well as a power curve, ie a function to convert a windspeed into a wind park production (that can take numpy.array or pandas.Series as inputs).

I was thinking to use a HMM to model the unobserved % availability of the wind park and to start with just two states : 50% availability or 100% availability.

I came with the following as model but it fails miserably, most probably because I am not familiar at all with pymc3(-hmm). I would be happy to get some help (or get some example that is close to my situation).

power_curve_th = pm.theano.compile.ops.as_op(itypes=[tt.dvector],otypes=[tt.dvector])(power_curve)

def create_hmm(observed, power_curve, p_0_a=np.r_[5, 1], p_1_a=np.r_[3, 5]):
    """observed = dataframe with 2 columns: windspeed_approx and prod_real and index=hourly datetimeindex on a year"""
    T, _ = observed.shape # T = 8760 = 365d * 24h/d

    p_0_rv = pm.Dirichlet("p_0", p_0_a)
    p_1_rv = pm.Dirichlet("p_1", p_1_a)

    P_tt = tt.stack([p_0_rv, p_1_rv])
    P_rv = pm.Deterministic("P_t", tt.shape_padleft(P_tt))

    pi_0_tt = compute_steady_state(P_rv)

    # S_rv represents the availability of the wind park (100% or 50%)
    S_rv = DiscreteMarkovChain("S_t", P_rv, pi_0_tt, shape=T)
#     S_rv.tag.test_value = np.array(observed > 0, dtype=np.int32)

    # the approximate/model windspeed is observed and follows a Weibull distribution
    Wapprox_rv = pm.Weibull("WindSpeed_model", alpha=1, beta=1.5, observed=observed.windspeed_approx)
    # the real windspeed is not observed
    W_rv = Wapprox_rv + pm.Normal("WindSpeed_delta",mu=0,sigma=1)
    # the real power production is observed
    P_rv = SwitchingProcess(
        "Power_t", [pm.Deterministic('Power_t_100%', power_curve(W_rv)), 
                    pm.Deterministic('Power_t_50%', power_curve(W_rv)*0.5)], 
        S_rv, observed=observed.prod_real
    )

    return P_rv

with pm.Model() as test_model:
    _ = create_hmm(df, power_curve_th, p_0_a=np.r_[1, 1], p_1_a=np.r_[1, 1])

Please provide the full traceback of any errors.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-3310bec9eec0> in <module>
     17     mu_rv = pm.Gamma("mu", E_mu ** 2 / Var_mu, E_mu / Var_mu)
     18 
---> 19     _ = create_hmm(df, power_curve_th, p_0_a=np.r_[1, 1], p_1_a=np.r_[1, 1])

<ipython-input-21-2de9b242ec73> in create_hmm(observed, power_curve, p_0_a, p_1_a)
     20     W_rv = Wapprox_rv + pm.Normal("WindSpeed_delta",mu=0,sigma=1)
     21 
---> 22     P_rv = SwitchingProcess(
     23         "Power_t", [pm.Deterministic('Power_t_100%', power_curve(W_rv)), pm.Deterministic('Power_t_50%', power_curve(W_rv)*0.5)], S_rv, observed=observed.prod_real
     24     )

...\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
    119             dist = cls.dist(*args, **kwargs, shape=shape)
    120         else:
--> 121             dist = cls.dist(*args, **kwargs)
    122         return model.Var(name, dist, data, total_size, dims=dims)
    123 

...\lib\site-packages\pymc3\distributions\distribution.py in dist(cls, *args, **kwargs)
    128     def dist(cls, *args, **kwargs):
    129         dist = object.__new__(cls)
--> 130         dist.__init__(*args, **kwargs)
    131         return dist
    132 

...\lib\site-packages\pymc3_hmm\distributions.py in __init__(self, comp_dists, states, *args, **kwargs)
    144         states_tv = get_test_value(self.states)
    145         bcast_comps = np.broadcast(
--> 146             states_tv, *[get_and_check_comp_value(x) for x in comp_dists[:31]]
    147         )
    148         shape = bcast_comps.shape

...\lib\site-packages\pymc3_hmm\distributions.py in <listcomp>(.0)
    144         states_tv = get_test_value(self.states)
    145         bcast_comps = np.broadcast(
--> 146             states_tv, *[get_and_check_comp_value(x) for x in comp_dists[:31]]
    147         )
    148         shape = bcast_comps.shape

...\lib\site-packages\pymc3_hmm\distributions.py in get_and_check_comp_value(x)
    100         return x.random()
    101     else:
--> 102         raise TypeError(
    103             "Component distributions must be PyMC3 Distributions. "
    104             "Got {}".format(type(x))

TypeError: Component distributions must be PyMC3 Distributions. Got <class 'pymc3.model.DeterministicWrapper'>
brandonwillard commented 3 years ago

The pm.Deterministics you provided to SwitchingModel aren't pm.Distributions; you'll need to create a pm.Distribution based on your power_curve Op.

Generally speaking, Distributions are the means by which log-likelihoods are specified in PyMC3, and SwitchingProcess needs to know the log-likelihoods for each state.

Unfortunately, PyMC3 cannot automatically determine the log-likelihood of power_curve(W_rv), so you need to specify it manually. Fortunately, PyMC3 version 4 starts to introduce this capability (e.g. see this PR).

Here's a template for such a Distribution:

class PowerCurveDist(pm.Continuous):
    def __init__(self, var, scale=1.0, **kwargs):
        self.var = tt.as_tensor_variable(var)
        self.scale = tt.as_tensor_variable(scale)
        # We need a default/test value for the distribution, so we specify the
        # mean, which will be used to determine such a value
        # TODO: Specify the _actual_ mean for this distribution
        self.mean = self.var
        super().__init__(**kwargs)

    def logp(self, obs):
        # TODO: Formulate the actual log-likelihood using `obs`, `self.var`, and
        # `self.scale`...
        loglik = (obs - self.var) * self.scale
        return loglik.sum()

    def random(self, *args):
        # If you want to implement a random sampler for this, you
        # can, but it shouldn't be necessary
        raise NotImplementedError()

You can use it as follows:

def create_hmm(observed, power_curve, p_0_a=np.r_[5, 1], p_1_a=np.r_[3, 5]):
    """observed = dataframe with 2 columns: windspeed_approx and prod_real and index=hourly datetimeindex on a year"""
    T, _ = observed.shape  # T = 8760 = 365d * 24h/d

    p_0_rv = pm.Dirichlet("p_0", p_0_a)
    p_1_rv = pm.Dirichlet("p_1", p_1_a)

    P_tt = tt.stack([p_0_rv, p_1_rv])
    P_rv = pm.Deterministic("P_t", tt.shape_padleft(P_tt))

    pi_0_tt = compute_steady_state(P_rv)

    # S_rv represents the availability of the wind park (100% or 50%)
    S_rv = DiscreteMarkovChain("S_t", P_rv, pi_0_tt, shape=T)

    # the approximate/model windspeed is observed and follows a Weibull distribution
    W_approx_rv = pm.Weibull(
        "WindSpeed_model", alpha=1, beta=1.5, observed=observed.windspeed_approx
    )

    # the real windspeed is not observed
    W_delta = pm.Normal("WindSpeed_delta", mu=0, sigma=1)
    W_rv = pm.Deterministic("W", W_approx_rv + W_delta)

    pm.Deterministic("Power_t_100%", power_curve(W_rv))
    pm.Deterministic("Power_t_50%", power_curve(W_rv) * 0.5)

    # the real power production is observed
    P_rv = SwitchingProcess(
        "Power_t",
        [PowerCurveDist.dist(W_rv, shape=T), PowerCurveDist.dist(W_rv, 0.5, shape=T)],
        S_rv,
        observed=observed.prod_real,
    )

    return P_rv

You can repost this question on the PyMC Discourse for more assistance.