Open g-bauer opened 2 years ago
Very nice overview!
Two quick comments without having gone through all the code:
SINumber
and f64
is an issue, we can circumvent that by multiplying, e.g., with MOL/MOL
. A bit hacky but could be worth it.DataSet
s. We "established" the approach with the equations of state and functionals, and in Python it is already basically treated as enum. But having different values for the associated types would break that concept, wouldn't it?But having different values for the associated types would break that concept, wouldn't it?
Mh not sure if I understand correctly but we could do essentially what we currently do with EosVariant
:
DataSetVariant
, that contains all possible DataSet
implementationsDataSet
for that enumOr we define the enum without implementing the trait and just write the match statements within the methods of the Estimator
that stores a Vec<DataSetVariant>
. Both should work.
If having a mixture of SINumber and f64 is an issue, we can circumvent that by multiplying, e.g., with MOL/MOL. A bit hacky but could be worth it.
That is part of the problem IMO. The other problem is that the number of return values for a single predict
call can be different, i.e. one value for vapor pressure but two values for the chemical potentials when binary VLE's are considered.
The prediction of a target property for a given model and a set of thermodynamic states is a problem that can easily be parallelized. Our current
DataSet
trait'spredict
method is very flexible in that a user can freely decide how to iterate over data and possibly calculate properties that are needed during that iteration. The downside of the current design is that we cannot implement the parallelization in a generic way.This issue discusses ways to change the
DataSet
trait so that a generic parallel evaluation of data points can be provided.In- and ouput of the prediction of a single data point
We could add a method into the trait that returns the prediction for a single data point.
The prediction method for all data can then loop over this method. The same method could be reused for a parallel and a sequential prediction method.
However, different properties require different inputs both in terms of the number of inputs as well as the data type(s). E.g.
SINumber
SINumber
f64
, t, p asSINumber
SINumber
)The above
predict_datapoint
method would work for the vapor pressure but not for the chemical potential.Option 1: Unify in- and output of the
prediction
method for a data pointWe could get rid of the unit in the in- and output and be generic in the length of in- and output data.
Implementation (click)
```rust pub trait DataSetrelative_difference
(and thus the cost function) can be generically implemented since the result of predict is just aVec<f64>
.DataSet
s can be stored in aVec<Arc<dyn DataSet<E>>>
in theEstimator
which then can be used as is.predict
are always the same which is nice to check/plot how a model's parameters perform.Vec
s have to be allocated during the iteration (most of the time with a single value)Option 2: Add associated types to the
DataSet
traitThe interface can be written in a nicer way using associated types and constants. However, the trait is no longer object safe. We'd have to build an enum with all possible implementors of
DataSet
so that those can be used within an Estimator inside aVec
.Implementation (click)
```rust pub trait DataSetEstimator
.An implementation for the vapor pressure could look like this:
Implementation (click)
For more complex predictions, e.g.
BinaryVleChemicalPotential
this could look like so:click
```rust pub struct BinaryVleChemicalPotential { input: Vec<(f64, f64, SINumber, SINumber)>, target: Vec<(SINumber, SINumber)>, } impl BinaryVleChemicalPotential { pub fn new( temperature: SIArray1, pressure: SIArray1, liquid_molefracs: Array1Option 3: Add parallel version of
predict
leaving the implementation to the userDataSet
trait