feos-org / feos

FeOs - A Framework for Equations of State and Classical Density Functional Theory
Other
116 stars 23 forks source link

New design for `DataSet` and `Estimator` #69

Open g-bauer opened 1 year ago

g-bauer commented 1 year ago

The prediction of a target property for a given model and a set of thermodynamic states is a problem that can easily be parallelized. Our current DataSet trait's predict method is very flexible in that a user can freely decide how to iterate over data and possibly calculate properties that are needed during that iteration. The downside of the current design is that we cannot implement the parallelization in a generic way.

This issue discusses ways to change the DataSet trait so that a generic parallel evaluation of data points can be provided.

In- and ouput of the prediction of a single data point

We could add a method into the trait that returns the prediction for a single data point.

// in trait DataSet
fn predict_datapoint(&self, eos: &Arc<E>, input: SINumber) -> Result<SINumber, EstimatorError>;

The prediction method for all data can then loop over this method. The same method could be reused for a parallel and a sequential prediction method.

However, different properties require different inputs both in terms of the number of inputs as well as the data type(s). E.g.

The above predict_datapoint method would work for the vapor pressure but not for the chemical potential.

Option 1: Unify in- and output of the prediction method for a data point

We could get rid of the unit in the in- and output and be generic in the length of in- and output data.

Implementation (click) ```rust pub trait DataSet: Send + Sync { /// This function is called prior to calculation of predictions. /// /// Can be used to calculate an internal state needed for the predictions, /// such as critical temperature or Antoine Parameters for vapor pressure /// extrapolation. fn prepare(&mut self, eos: &Arc) -> Result<(), EstimatorError>; /// Prediction of the target given the model and a single data point. fn predict_datapoint(&self, eos: &Arc, input: &[f64]) -> Vec>; /// Return an iterator over input elements. fn input(&self) -> &[Vec]; /// Prediction of the target given the model for all stored data points. fn predict(&mut self, eos: &Arc) -> Result, EstimatorError> { self.prepare(eos)?; self.input() .par_iter() .flat_map(|input| self.predict_datapoint(eos, input)) .collect() } } ```

Option 2: Add associated types to the DataSet trait

The interface can be written in a nicer way using associated types and constants. However, the trait is no longer object safe. We'd have to build an enum with all possible implementors of DataSet so that those can be used within an Estimator inside a Vec.

Implementation (click) ```rust pub trait DataSet: Send + Sync { type Input: Send + Sync; // data type of input type Target: Send + Sync; // data type of output fn target(&self) -> &[Self::Target]; /// This function is called prior to calculation of predictions. /// /// Can be used to calculate an internal state needed for the predictions, /// such as critical temperature or Antoine Parameters for vapor pressure /// extrapolation. fn prepare(&mut self, eos: &Arc) -> Result<(), EstimatorError>; /// Prediction of the target given the model and a single data point. fn predict_datapoint( &self, eos: &Arc, input: &Self::Input, ) -> Result; /// Return an iterator over input elements. fn input(&self) -> &[Self::Input]; /// Calculate relative difference fn relative_difference_datapoint( &self, eos: &Arc, input: &Self::Input, target: &Self::Target, ) -> Result<[f64; NTARGET], EstimatorError>; /// Prediction of the target given the model for all stored data points. fn predict(&mut self, eos: &Arc) -> Result, EstimatorError> { self.prepare(eos)?; self.input() .par_iter() .map(|input| self.predict_datapoint(eos, input)) .collect() } fn relative_difference(&mut self, eos: &Arc) -> Result, EstimatorError> { self.prepare(eos)?; // can be done without collecting twice with par_extend? let rel_dif = self .input() .par_iter() .zip(self.target()) .map(|(input, target)| self.relative_difference_datapoint(eos, input, target)) .collect::, EstimatorError>>()?; Ok(rel_dif.par_iter().cloned().flatten().collect()) } } ```

An implementation for the vapor pressure could look like this:

Implementation (click)

pub struct VaporPressure {
    pub target: Vec<SINumber>,
    temperature: Vec<SINumber>,
    max_temperature: SINumber,
    antoine: Option<Antoine>,
    solver_options: SolverOptions,
}

impl VaporPressure {
    /// Create a new data set for vapor pressure.
    ///
    /// If the equation of state fails to compute the vapor pressure
    /// (e.g. when it underestimates the critical point) the vapor
    /// pressure can be estimated.
    /// If `extrapolate` is `true`, the vapor pressure is estimated by
    /// calculating the slope of ln(p) over 1/T.
    /// If `extrapolate` is `false`, it is set to `NAN`.
    pub fn new(
        target: SIArray1,
        temperature: SIArray1,
        extrapolate: bool,
        solver_options: Option<SolverOptions>,
    ) -> Result<Self, EstimatorError> {
        let max_temperature = temperature
            .to_reduced(KELVIN)?
            .into_iter()
            .reduce(|a, b| a.max(b))
            .unwrap()
            * KELVIN;

        let antoine = if extrapolate {
            Some(Antoine::default())
        } else {
            None
        };

        Ok(Self {
            target: target.into_iter().collect(),
            temperature: temperature.into_iter().collect(),
            max_temperature,
            antoine,
            solver_options: solver_options.unwrap_or_default(),
        })
    }
}

impl<E: EquationOfState> DataSet<E, 1> for VaporPressure {
    type Input = SINumber;
    type Target = SINumber;

    fn target(&self) -> &[Self::Target] {
        &self.target
    }

    fn prepare(&mut self, eos: &Arc<E>) -> Result<(), EstimatorError> {
        if self.antoine.is_some() {
            let critical_point = State::critical_point(
                eos,
                None,
                Some(self.max_temperature * KELVIN),
                self.solver_options,
            )?;
            let tc = critical_point.temperature;
            let pc = critical_point.pressure(Contributions::Total);

            let t0 = 0.9 * tc;
            let p0 = PhaseEquilibrium::pure(eos, t0, None, self.solver_options)?
                .vapor()
                .pressure(Contributions::Total);
            self.antoine = Some(Antoine::new(pc, p0, tc, t0)?);
        }
        Ok(())
    }

    fn input(&self) -> &[Self::Input] {
        &self.temperature
    }

    fn predict_datapoint(
        &self,
        eos: &Arc<E>,
        input: &Self::Input,
    ) -> Result<Self::Target, EstimatorError> {
        let vle = PhaseEquilibrium::pure(eos, *input, None, self.solver_options);
        if let Ok(vle) = vle {
            Ok(vle.vapor().pressure(Contributions::Total))
        } else if let Some(antoine) = &self.antoine {
            antoine.pressure(*input)
        } else {
            Ok(f64::NAN * PASCAL)
        }
    }

    fn relative_difference_datapoint(
        &self,
        eos: &Arc<E>,
        input: &Self::Input,
        target: &Self::Target,
    ) -> Result<[f64; 1], EstimatorError> {
        self.predict_datapoint(eos, input)
            .and_then(|prediction| Ok([((target - prediction) / target).into_value()?]))
    }
}

For more complex predictions, e.g. BinaryVleChemicalPotential this could look like so:

click ```rust pub struct BinaryVleChemicalPotential { input: Vec<(f64, f64, SINumber, SINumber)>, target: Vec<(SINumber, SINumber)>, } impl BinaryVleChemicalPotential { pub fn new( temperature: SIArray1, pressure: SIArray1, liquid_molefracs: Array1, vapor_molefracs: Array1, ) -> Self { let target: Vec<(SINumber, SINumber)> = (0..temperature.len()) .map(|_| (500.0 * JOULE / MOL, 500.0 * JOULE / MOL)) .collect(); let mut input = Vec::new(); for (((&xi, &yi), t), p) in liquid_molefracs .iter() .zip(vapor_molefracs.iter()) .zip(temperature.into_iter()) .zip(pressure.into_iter()) { input.push((xi, yi, t, p)); } assert_eq!(input.len(), target.len()); Self { input, target } } } impl DataSet for BinaryVleChemicalPotential { type Input = (f64, f64, SINumber, SINumber); type Target = (SINumber, SINumber); fn target(&self) -> &[Self::Target] { &self.target } fn input(&self) -> &[Self::Input] { &self.input } #[inline] fn prepare(&mut self, eos: &Arc) -> Result<(), EstimatorError> { Ok(()) } fn predict_datapoint( &self, eos: &Arc, input: &Self::Input, ) -> Result { let xi = input.0; let yi = input.1; let t = input.2; let p = input.3; let liquid_moles = arr1(&[xi, 1.0 - xi]) * MOL; let liquid = State::new_npt(eos, t, p, &liquid_moles, DensityInitialization::Liquid)?; let mu_liquid = liquid.chemical_potential(Contributions::Total); let vapor_moles = arr1(&[yi, 1.0 - yi]) * MOL; let vapor = State::new_npt(eos, t, p, &vapor_moles, DensityInitialization::Vapor)?; let mu_vapor = vapor.chemical_potential(Contributions::Total); Ok(( mu_liquid.get(0) - mu_vapor.get(0) + 500.0 * JOULE / MOL, mu_liquid.get(1) - mu_vapor.get(1) + 500.0 * JOULE / MOL, )) } fn relative_difference_datapoint( &self, eos: &Arc, input: &Self::Input, target: &Self::Target, ) -> Result<[f64; 2], EstimatorError> { self.predict_datapoint(eos, input).and_then(|prediction| { Ok([ ((target.0 - prediction.0) / target.0).into_value()?, ((target.1 - prediction.1) / target.1).into_value()?, ]) }) } } ```

Option 3: Add parallel version of predict leaving the implementation to the user

prehner commented 1 year ago

Very nice overview!

Two quick comments without having gone through all the code:

g-bauer commented 1 year ago

But having different values for the associated types would break that concept, wouldn't it?

Mh not sure if I understand correctly but we could do essentially what we currently do with EosVariant:

Or we define the enum without implementing the trait and just write the match statements within the methods of the Estimator that stores a Vec<DataSetVariant>. Both should work.

g-bauer commented 1 year ago

If having a mixture of SINumber and f64 is an issue, we can circumvent that by multiplying, e.g., with MOL/MOL. A bit hacky but could be worth it.

That is part of the problem IMO. The other problem is that the number of return values for a single predict call can be different, i.e. one value for vapor pressure but two values for the chemical potentials when binary VLE's are considered.