felix-clark / ndarray-glm

Rust library for linear, logistic, and generalized linear model regression
MIT License
22 stars 0 forks source link

Return a `Fit` from a function #26

Closed multimeric closed 3 years ago

multimeric commented 3 years ago

I encountered an annoying case where I wanted to write a function that was basically fn(data: Array2<T>) -> Fit<Linear, T>. Unfortunately the Fit struct holds a reference to the Model used to create it, so it's basically impossible to return the Fit at all. The best you could do is use owning_ref to return both together, but this is certainly awkward. I could of course return just the model, but then I have to call fit() and therefor re-run the substantial processing involved each time I want access to the Fit. Can you think of a way to make this use case a bit nicer? Maybe removing the Model reference from Fit, and instead make it an argument to the functions that require it?

felix-clark commented 3 years ago

I agree that the Fit's owning of a reference to the originating Model is a bit awkward, although it's not obvious to me how to best de-couple it (or if it should in fact be completely de-coupled). I can imagine an application that runs fits with different settings to compare the results; for such a situation I don't think we'd want to take ownership of the data and force the user to clone a potentially large dataset. I also don't think a Fit is sensibly defined without the Model it originates from, so I think that forcing the user to pass a Model into every call would permit nonsensical use (not to mention it's un-ergonomic).

I can think of two suggestions, depending on your use-case:

  1. Make your interface take a reference to the data Array rather than a value. The Fit you get will store a reference to that data anyway. (Do you run into any issues with this approach?)
  2. Write your own FitSummary class that holds all the quantities you're interested in and return that instead. Then you will no longer need the data to be in memory and it will be dropped when your function returns.
multimeric commented 3 years ago

Both good suggestions, and I take your point about it being nonsensical to allow for the wrong model to be passed in. Considering this, the current functionality is probably fine.