Closed kylecarow closed 1 month ago
Done with converting unwraps to results! Also switched to binary search (short circuiting if it's the point is the last element of the grid), for convenience and because it should scale better.
Finished just before my flight boarded, lol.
@robfitzgerald @nreinicke I've added additional checks to validation to prevent panics on bad inputs, as well as corresponding tests. I think it would be kind of impossible/impractical to code out every use of [] for indexing, things are indexed quite a lot - I have replaced it where possible, and left comments in-code where direct indexing is okay because the data is checked by validation methods. Most of the remaining indexing should be outputs from find_nearest_index
, so if there are panics with indexing in the future, it means there is likely a bug in that function, which should already be pretty robust.
Replacing []
with get
s would be pretty onerous in spots like this:
https://github.com/NREL/routee-compass/blob/e48f30e237d315e78dfb435c04f9e701fbdec446/rust/routee-compass-powertrain/src/routee/prediction/interpolation/interp.rs#L434-L438
Nice, thanks for adding the additional checks in. I'm on board with your assessment about not modifying every index as long as we have the appropriate validation check at load time. I'll leave the floor open to @robfitzgerald for any other comments but overall this looks good to me.
@kylecarow i'm sorry i gotta be a stickler on this since this is in the core lib, so i would prefer we don't pass these potential panics onto users if they do something unexpected. but i think this could be a quick fix, it's a good opportunity for writing helper functions that can encapsulate the error handling. that should split the difference for us wrt conciseness and performance. i may have this wrong but here's a guess on what those signatures would look like:
fn interp1(f: Vec<f64>, x_diff: f64, l: usize, u: usize) -> Result<f64, String> { todo!() }
fn interp2(f: Vec<Vec<f64>, x_diff: f64, l: (usize, usize), u: (usize, usize)) -> Result<f64, String> { todo!() }
fn interp3(f: Vec<Vec<Vec<f64>>>, x_diff: f64, l: (usize, usize, usize), u: (usize, usize, usize)) -> Result<f64, String> { todo!() }
fn interpN(f: ArrayD<f64>, x_diff: f64, l: Vec<usize>, u: Vec<usize>) -> Result<f64, String> { todo!() }
let c00 = interp3(self.f_xyz, x_diff, (x_l, y_l, z_l), (x_u, y_l, z_l))?;
or if you don't like that, i could also imagine just array lookup helpers like this:
fn arr3(f: Vec<Vec<Vec<f64>>>, x: usize, y: usize, z: usize) -> Result<f64, String> { todo!() }
that could be used like this:
let c00 = arr3(self.f_xyz, x_l, y_l, z_l)? * (1.0 - x_diff) + arr3(self.f_xyz(x_u, y_l, z_l)? * x_diff;
or anything else you think could make this not be super annoying to implement!
No, this is good discussion, you're definitely not being a nuisance! I think the concept is that the validate
method gets called just once when the interpolator is getting built which gives us our guarantees about the index bounds.
I think the concept is that the validate method gets called just once when the interpolator is getting built which gives us our guarantees about the index bounds.
Well, in that case, I would love it if we had a Interpolator::new
method that calls validate internally. This gets out of the OO-ish framing of "there's a method you need to know gets called first to use this." But also recognizing that the new method would need to take an ArrayD
and then do the work to specialize that down to 1/2/3D arrays when appropriate in the new method. Could we make an issue to come back to that?
validate is called every time. while that does sound expensive
@robfitzgerald No worries, here's a description of the data validation that Nick and I had discussed in-person at PyCon:
There are two validation methods:
InterpolationSpeedGradeModel::new()
, I imagine only once.interpolate
call, i.e. the interpolant point.
InterpolationSpeedGradeModel::predict
from the PredictionModel
impl is calledDoes that help clear up the data validation process? If this still seems expensive I'm open to suggestions for improvements :)
It looks like we're all in a comment data race!
@robfitzgerald I had wanted an Interpolator::new
method that calls validate()
, but struggled with it due to the way the interpolator variants need to be instantiated, it would be hard to create an function signature that is clear, concise, and allows passage of the data needed for any variant. Your ArrayD
approach is an interesting way around that!
A thought that just occurred to me is to have a new
method for the variant structs, rather than the Interpolator
enum itself. For example, creating a 2D interpolator would change like this:
- let interp = Interpolator::Interp2D(Interp2D {
- x: vec![0.05, 0.10, 0.15],
- y: vec![0.10, 0.20, 0.30],
- f_xy: vec![vec![0., 1., 2.], vec![3., 4., 5.], vec![6., 7., 8.]],
- });
- interp.validate()?;
+ let interp = Interpolator::Interp2D(Interp2D::new(
+ vec![0.05, 0.10, 0.15],
+ vec![0.10, 0.20, 0.30],
+ vec![vec![0., 1., 2.], vec![3., 4., 5.], vec![6., 7., 8.]],
+ )?);
We could then restrict users from creating the variant structs directly (i.e. via writing Interp2D { ... }
instead of calling Interp2D::new(...)
) by adding a phantom non-pub
field to the variant structs, as described in number 2 here: https://stackoverflow.com/a/77683321/11278044
If we make it impossible to create an Interpolator
without Interpolator::validate()
being called, there are some ok_or_else
calls that could be changed back to unwrap
s (they should never panic), but I don't know if we would want to do that, or even if we care about doing that.
nice work, @kylecarow. while using PhantomData
feels a little spooky :ghost: i think it seems it's the smoothest way to hack these default enum constructors to be private and enforce our validation at construction time. nice work! and we can always add the more general Interpolator::new
method which takes an arbitrary ArrayD
and specializes down to 1/2/3D at a later time, and it would just use the constructors you wrote. thanks!
i was trying to think why we haven't used this solution before, given that we rely heavily on patterns using rust enums. i realized that, when an enum requires a custom constructor, i have created a matching enum that has the constructor assets which then builds the "runtime" enum at some initialization phase. so, for example, Runtime::A builds a Vec<Vec
pub enum Config {
A { file: String },
B
}
impl Config {
pub fn build(&self) -> Result<Runtime, String> {
match self {
A { file } => Ok(Runtime { table: read_table(file)! }),
B => Ok(Runtime::B)
}
}
}
pub enum Runtime {
A { table: Vec<Vec<f64>>},
B
}
this doesn't prevent users from directly building Runtimes, but it's a pretty clear separation of concerns, representing the changes in state due to initialization via the type system, at least removing ambiguity about whether the runtime ~is ever "valid"~ has ever been validated.
i guess this is were i still part ways from this implementation. the user could still directly build a runtime, and, that's up to them. and that's why i would still never assume i can index a vector directly. but i think we can save that kind of nitpickery for a later date, i still think this is good to go!
Just pushed a minor commit to remove a superfluous case in output handling from find_nearest_index
and a very minor formatting tweak. It should be good to merge whenever, so long as you both still approve.
And the notes above about the need to rework some code higher up in the call chain to actually be able to use features other than speed and grade still applies.
edit:
I'll also note that Nick fielded the idea of some sort of implementation for 'clipping' within interpolation itself, which we currently do here: https://github.com/NREL/routee-compass/blob/4f8b3b6e5c5e3e9c16b87873d11dd050425dc696/rust/routee-compass-powertrain/src/routee/prediction/interpolation/interpolation_speed_grade_model.rs#L36-L67
I've plumbed up the 0/1/2/3/N-dimensional interpolation code that I developed for FASTSim to the
routee-compass-powertrain
crate'sModelType::Interpolate
. This is capable of taking any dimension data, withInterpolator
enum variants that allow hardcoded interpolation for performance benefits (there are benchmarks in FASTSim 3 for interpolation). It supports multiple interpolation 'strategies' for 1D interpolation, but only linear for other dimensionalities.There are more improvements to be made, as documented in #135, but this passes all tests in Rust and the single test in Python. Notably, the 2D speed/grade stuff in
InterpolationSpeedGradeModel::new()
is still hardcoded, so more work and thought is required to allow more dimensions and proper mapping of dimensions (and any unit considerations).See the tests in
interp.rs
for examples on usage. Be sure to call.validate()
after instantiating to run checks on the supplied data.