Closed sdementen closed 3 years ago
The pm.Deterministic
s you provided to SwitchingModel
aren't pm.Distribution
s; you'll need to create a pm.Distribution
based on your power_curve
Op
.
Generally speaking, Distribution
s are the means by which log-likelihoods are specified in PyMC3, and SwitchingProcess
needs to know the log-likelihoods for each state.
Unfortunately, PyMC3 cannot automatically determine the log-likelihood of power_curve(W_rv)
, so you need to specify it manually. Fortunately, PyMC3 version 4 starts to introduce this capability (e.g. see this PR).
Here's a template for such a Distribution
:
class PowerCurveDist(pm.Continuous):
def __init__(self, var, scale=1.0, **kwargs):
self.var = tt.as_tensor_variable(var)
self.scale = tt.as_tensor_variable(scale)
# We need a default/test value for the distribution, so we specify the
# mean, which will be used to determine such a value
# TODO: Specify the _actual_ mean for this distribution
self.mean = self.var
super().__init__(**kwargs)
def logp(self, obs):
# TODO: Formulate the actual log-likelihood using `obs`, `self.var`, and
# `self.scale`...
loglik = (obs - self.var) * self.scale
return loglik.sum()
def random(self, *args):
# If you want to implement a random sampler for this, you
# can, but it shouldn't be necessary
raise NotImplementedError()
You can use it as follows:
def create_hmm(observed, power_curve, p_0_a=np.r_[5, 1], p_1_a=np.r_[3, 5]):
"""observed = dataframe with 2 columns: windspeed_approx and prod_real and index=hourly datetimeindex on a year"""
T, _ = observed.shape # T = 8760 = 365d * 24h/d
p_0_rv = pm.Dirichlet("p_0", p_0_a)
p_1_rv = pm.Dirichlet("p_1", p_1_a)
P_tt = tt.stack([p_0_rv, p_1_rv])
P_rv = pm.Deterministic("P_t", tt.shape_padleft(P_tt))
pi_0_tt = compute_steady_state(P_rv)
# S_rv represents the availability of the wind park (100% or 50%)
S_rv = DiscreteMarkovChain("S_t", P_rv, pi_0_tt, shape=T)
# the approximate/model windspeed is observed and follows a Weibull distribution
W_approx_rv = pm.Weibull(
"WindSpeed_model", alpha=1, beta=1.5, observed=observed.windspeed_approx
)
# the real windspeed is not observed
W_delta = pm.Normal("WindSpeed_delta", mu=0, sigma=1)
W_rv = pm.Deterministic("W", W_approx_rv + W_delta)
pm.Deterministic("Power_t_100%", power_curve(W_rv))
pm.Deterministic("Power_t_50%", power_curve(W_rv) * 0.5)
# the real power production is observed
P_rv = SwitchingProcess(
"Power_t",
[PowerCurveDist.dist(W_rv, shape=T), PowerCurveDist.dist(W_rv, 0.5, shape=T)],
S_rv,
observed=observed.prod_real,
)
return P_rv
You can repost this question on the PyMC Discourse for more assistance.
I am interested to detect when a wind park is partially available (for instance, when only 50% of its wind turbines are running, the others being in maintenance).
I have as input two time series:
I was thinking to use a HMM to model the unobserved % availability of the wind park and to start with just two states : 50% availability or 100% availability.
I came with the following as model but it fails miserably, most probably because I am not familiar at all with pymc3(-hmm). I would be happy to get some help (or get some example that is close to my situation).
Please provide the full traceback of any errors.