Closed jduerholt closed 9 months ago
I think without a call method we also do not need a jacobian, right? :D but a call method for BatchConstraints / Interpoint constraints would not be less well-defined than for the NChooseK constraint (and there we also have a call method). In fact I think there is a relatively natural way to define a call and jacobian method for batch constraints (have not thought about other types of interpoint constraints): For each batch we evaluate all batch_size - 1
linear constraints assigned to the batch. Per batch we compute some norm (or other suitable function) of these linear constraint evaluations. The call function then returns this number for each batch. The jacobian could be defined as the derivative of the function i just described with respect to each variable of each experiment which is contained in the batch.
An implementation (jacobian still missing) could look like this (does not yet respect changes from this pr, wrote this piece of code down before you opened the pr, sry):
class BatchConstraint(Constraint):
"""Batch constraint which is fulfilled iff batch_size subsequent
experiments have the same value for all features specified in features.
Attributes:
feature : feature key (str) the constraint operates on.
batch_size (int): size of the batch
"""
type: Literal["BatchConstraint"] = "BatchConstraint"
feature: str
batch_size: int
@root_validator(pre=False, skip_on_failure=True)
def validate_batch_size(cls, values):
"""Validate that batch_size is larger than 1."""
if values["batch_size"] < 2:
raise ValueError("batch_size must be larger than 1")
return values
def is_fulfilled(
self, experiments: pd.DataFrame, tol: float = 1e-6
) -> pd.Series:
values = self(experiments)
return pd.Series(
np.isclose(values, 0, atol=tol), index=values.index
)
def __call__(self, experiments: pd.DataFrame) -> pd.Series:
"""Numerically evaluates the constraint. Returns the distance to the constraint fulfillment
for each batch of size batch_size.
Args:
experiments (pd.DataFrame): Dataframe to evaluate the constraint on.
Returns:
pd.Series: Distance to reach constraint fulfillment.
"""
n_batches = (experiments.shape[0] + self.batch_size - 1) // self.batch_size
feature_values = np.zeros(n_batches * self.batch_size)
feature_values[:experiments.shape[0]] = experiments[self.feature].values
feature_values[experiments.shape[0]:] = feature_values[-self.batch_size]
feature_values = feature_values.reshape(n_batches, self.batch_size).T
batchwise_constraint_matrix = np.zeros(shape=(self.batch_size-1, self.batch_size))
batchwise_constraint_matrix[:,0] = 1.
batchwise_constraint_matrix[:,1:] = -np.eye(self.batch_size-1)
return pd.Series(np.linalg.norm(batchwise_constraint_matrix @ feature_values, axis=0, ord=2)**2, index=[f"batch_{i}" for i in range(n_batches)])
On the other hand, if we don't really need it, we can also just drop it ... :D What do you think @jduerholt @dlinzner-bcs @KappatC ?
@Osburg : thanks, I do not think that we acutally need it, but as you have it here already, let us add it or? I would finalize this PR here without the implmentation and just raising NotImplementedError
for __call__
and jacobian
, and then you add your implementation in a seperate PR, ok for you?
Sure, this is fine! :)
@dlinzner-bcs: should be fine now from my side and ready for review.
@dlinzner-bcs @KappatC : Polytope sampler should now also work, I just need to fix some edge cases ;) I tell you once it is finished ...
I just close this PR and reopen as there are some github problems ...
As discussed here, this adds the first interpoint constraint data model to BoFire.
From my perspective, a call method on interpoint constraints is somehow ill defined and not needed. What do you think about the
jacobian
method?The integration in the
Constraints
data model is not yet complete, as the return value of the is_fulfilled method is now a bool and not a series of bools as in the case of the intrapoints. Will fix this later.@Osburg @dlinzner-bcs @KappatC