Closed AUTProgram closed 11 months ago
On a different note, it's not clear to me whether the option of passing an int
into Fitter.evaluate()
as x_fit
is actually helpful. I assume people will only rarely use this functionality and in those cases, they could also just do x_fit = np.linspace(np.min(fitter.x), np.max(fitter.x), n_steps)
themselves and pass it to the function. Conceptually, the purpose of x_fit
is currently very different depending on whether the user passes an array or an integer and it seems to be causing problems for type checking too.
-) Model functions with more than one y-channel now return y-data array with new shape
Is this true? Doesn't the sinusoid model return a 1D array? Or by "new shape" do you mean swapping num_samples
and num_y_channels
?
-) As a result, y-data passed from fitter to models for parameter estimation are 2D, even for models that only contain a single y-channel. Therefore, in estimate_parameters for all models with just a single y-channel, have added a line to squeeze the y-array to make sure it is 1D.
Is this needed? What would have broken if you didn't do it? Doesn't numpy broadcasting mean this is fine?
On a different note, it's not clear to me whether the option of passing an int into Fitter.evaluate() as x_fit is actually helpful. I assume people will only rarely use this functionality and in those cases, they could also just do x_fit = np.linspace(np.min(fitter.x), np.max(fitter.x), n_steps) themselves and pass it to the function. Conceptually, the purpose of x_fit is currently very different depending on whether the user passes an array or an integer and it seems to be causing problems for type checking too.
Yes, I agree with this. If you want to remove that option go ahead!
Is this true? Doesn't the sinusoid model return a 1D array? Or by "new shape" do you mean swapping num_samples and num_y_channels?
Yeah, I only meant swapping the order to (num_y_channels, num_samples)
. Models with a single y-channel still return a 1D array if the independent variable x
is 1D (but note that the user can in principle also pass 2D arrays as x
to models with a single y-channel, as long as one of the axes has length one).
Is this needed? What would have broken if you didn't do it? Doesn't numpy broadcasting mean this is fine?
Yes, it is needed. I didn't add this at first, but then almost all tests fail at the estimate_parameters
stage. The problem isn't so much the broadcasting, the problem is that some of the algorithms used to find initial estimates implicitly assume that the x- and y-data is 1D. For example, in get_spectrum
, we do y_f = np.fft.fft(y, norm="ortho") / np.sqrt(n)
and then afterwards y_f = y_f[1:]
. But since y_f
now has shape (1, num_samples)
, the second expression results in y_f = []
. I'm actually surprised it doesn't raise an exception because y_f[1]
does throw an error. If y_f
was always 2D, we could fix this by doing y_f = y_f[:, 1:]
. But this will fail if the user passes a 1D array since then the second index would be out of bounds, because the second axis doesn't exist. As another example, in gaussian.estimate_parameters
, we do model_parameters["y0"].heuristic = y[0]
. But if y
is 2D as passed by the fitter, the heuristic for y0
is no longer the desired scalar, but just the whole y-array! Similar problems occcur for the other models.
In short, the problem is that even models that contain only a single y-channel, in particular their estimate_parameters
methods, need to handle both 1D and 2D y-arrays, because the fitter internally always uses 2D arrays and calls methods of Model
. There are two ways out of this: 1) promote everything to 2D arrays in models too, as we do in fitter or 2) internally squeeze any 2D arrays to 1D in estimate_parameters
to get valid estimates. I prefer option 2) because models with a single y-channel conceptually just have one dependent variable and I find it more intuitive to think about this is as a 1D array. Option 1) would also mean that if the user passes a 1D x-array to the model, he would get a 2D y-array of shape (1, num_samples)
out, which I don't like. Besides, if we ever started supporting two independent variables, we would we then have to internally promote everything to 3D arrays in models, which just seems unnecessary.
@hartytp Feel free to have another look after latest updates, otherwise I'll merge today after lunch
@AUTProgram modulo the small question about the arguments in evaluate
I'm very happy for this to be merged.
This PR makes a major breaking change to ionics_fits as it changes the ordering of axes in arrays for y-axis data. The new ordering is
(num_y_channels, num_samples)
. This fixes the problems mentioned in https://github.com/OxIonics/ionics_fits/issues/97, see also that issue for a more general discussion.The main code changes resulting from this decision are as follows: -) Model functions with more than one y-channel now return y-data array with new shape -) Fitter now requires observed y-data to be passed correctly with new shape and assumes that all models return data in the new shape -) Function annotations for all models and fitter functions have been updated -) Because only one independent variable is currently supported, the fitter always stores x-axis data as a 1D array. Even in the previous version, an error was thrown whenever the fitter was passed x-axis data that was not a 1D array. Since one or more y-channels (and corresponding sigma-channels) are supported, the fitter internally always uses and stores y-axis data (and sigma) data as 2D arrays. -) As a result, y-data passed from fitter to models for parameter estimation are 2D, even for models that only contain a single y-channel. Therefore, in
estimate_parameters
for all models with just a single y-channel, have added a line to squeeze the y-array to make sure it is 1D. -) In order to be able to use results from fitting for plotting, theFitter.evaluate()
method now takes an optional keyword argument calledtranspose_and_squeeze
. This ensures that arrays returned are compatible with theplot()
function of matplotlib. Squeezing is required becauseplot()
requires data to be contained along columns and does not produce the desired result when using arrays of shape(1, n_samples)
.Apart from these changes, I have also modified the
Model.get_num_y_channels()
method to raiseNotImplementedError
to ensure that subclasses define this method. I ran into a problem whereMappedModel
for a model with multiple y-channels did not return the correct number of channels because the mapped model did not callinner.get_num_y_channels()
and instead silently took the default of 1 fromModel.get_num_y_channels()
.I have run
poe test
, all tests pass successfully.One remaining issue is that strictly speaking, our function annotation for
Fitter.__init__()
is not quite correct. We currently allow the y-axis data for these models to be either 1D or 2D, but the annotationy: ArrayLike[("num_y_channels", "num_samples"), np.float64]
, says it should only be 2D. Similarly, the annotations forestimate_parameters
methods incorrectly say they only expect 1D arrays.