Change ordering of array for y-axis data

AUTProgram commented 11 months ago

This PR makes a major breaking change to ionics_fits as it changes the ordering of axes in arrays for y-axis data. The new ordering is (num_y_channels, num_samples). This fixes the problems mentioned in https://github.com/OxIonics/ionics_fits/issues/97, see also that issue for a more general discussion.

The main code changes resulting from this decision are as follows: -) Model functions with more than one y-channel now return y-data array with new shape -) Fitter now requires observed y-data to be passed correctly with new shape and assumes that all models return data in the new shape -) Function annotations for all models and fitter functions have been updated -) Because only one independent variable is currently supported, the fitter always stores x-axis data as a 1D array. Even in the previous version, an error was thrown whenever the fitter was passed x-axis data that was not a 1D array. Since one or more y-channels (and corresponding sigma-channels) are supported, the fitter internally always uses and stores y-axis data (and sigma) data as 2D arrays. -) As a result, y-data passed from fitter to models for parameter estimation are 2D, even for models that only contain a single y-channel. Therefore, in estimate_parameters for all models with just a single y-channel, have added a line to squeeze the y-array to make sure it is 1D. -) In order to be able to use results from fitting for plotting, the Fitter.evaluate() method now takes an optional keyword argument called transpose_and_squeeze. This ensures that arrays returned are compatible with the plot() function of matplotlib. Squeezing is required because plot() requires data to be contained along columns and does not produce the desired result when using arrays of shape (1, n_samples).

Apart from these changes, I have also modified the Model.get_num_y_channels() method to raise NotImplementedError to ensure that subclasses define this method. I ran into a problem where MappedModel for a model with multiple y-channels did not return the correct number of channels because the mapped model did not call inner.get_num_y_channels() and instead silently took the default of 1 from Model.get_num_y_channels().

I have run poe test, all tests pass successfully.

One remaining issue is that strictly speaking, our function annotation for Fitter.__init__() is not quite correct. We currently allow the y-axis data for these models to be either 1D or 2D, but the annotation y: ArrayLike[("num_y_channels", "num_samples"), np.float64], says it should only be 2D. Similarly, the annotations for estimate_parameters methods incorrectly say they only expect 1D arrays.

AUTProgram commented 11 months ago

On a different note, it's not clear to me whether the option of passing an int into Fitter.evaluate() as x_fit is actually helpful. I assume people will only rarely use this functionality and in those cases, they could also just do x_fit = np.linspace(np.min(fitter.x), np.max(fitter.x), n_steps) themselves and pass it to the function. Conceptually, the purpose of x_fit is currently very different depending on whether the user passes an array or an integer and it seems to be causing problems for type checking too.

hartytp commented 11 months ago

-) Model functions with more than one y-channel now return y-data array with new shape

Is this true? Doesn't the sinusoid model return a 1D array? Or by "new shape" do you mean swapping num_samples and num_y_channels?

-) As a result, y-data passed from fitter to models for parameter estimation are 2D, even for models that only contain a single y-channel. Therefore, in estimate_parameters for all models with just a single y-channel, have added a line to squeeze the y-array to make sure it is 1D.

Is this needed? What would have broken if you didn't do it? Doesn't numpy broadcasting mean this is fine?

hartytp commented 11 months ago

On a different note, it's not clear to me whether the option of passing an int into Fitter.evaluate() as x_fit is actually helpful. I assume people will only rarely use this functionality and in those cases, they could also just do x_fit = np.linspace(np.min(fitter.x), np.max(fitter.x), n_steps) themselves and pass it to the function. Conceptually, the purpose of x_fit is currently very different depending on whether the user passes an array or an integer and it seems to be causing problems for type checking too.

Yes, I agree with this. If you want to remove that option go ahead!

AUTProgram commented 11 months ago

Is this true? Doesn't the sinusoid model return a 1D array? Or by "new shape" do you mean swapping num_samples and num_y_channels?

Yeah, I only meant swapping the order to (num_y_channels, num_samples). Models with a single y-channel still return a 1D array if the independent variable x is 1D (but note that the user can in principle also pass 2D arrays as x to models with a single y-channel, as long as one of the axes has length one).

AUTProgram commented 11 months ago

Is this needed? What would have broken if you didn't do it? Doesn't numpy broadcasting mean this is fine?

Yes, it is needed. I didn't add this at first, but then almost all tests fail at the estimate_parameters stage. The problem isn't so much the broadcasting, the problem is that some of the algorithms used to find initial estimates implicitly assume that the x- and y-data is 1D. For example, in get_spectrum, we do y_f = np.fft.fft(y, norm="ortho") / np.sqrt(n) and then afterwards y_f = y_f[1:]. But since y_f now has shape (1, num_samples), the second expression results in y_f = []. I'm actually surprised it doesn't raise an exception because y_f[1] does throw an error. If y_f was always 2D, we could fix this by doing y_f = y_f[:, 1:]. But this will fail if the user passes a 1D array since then the second index would be out of bounds, because the second axis doesn't exist. As another example, in gaussian.estimate_parameters, we do model_parameters["y0"].heuristic = y[0]. But if y is 2D as passed by the fitter, the heuristic for y0 is no longer the desired scalar, but just the whole y-array! Similar problems occcur for the other models.

In short, the problem is that even models that contain only a single y-channel, in particular their estimate_parameters methods, need to handle both 1D and 2D y-arrays, because the fitter internally always uses 2D arrays and calls methods of Model. There are two ways out of this: 1) promote everything to 2D arrays in models too, as we do in fitter or 2) internally squeeze any 2D arrays to 1D in estimate_parameters to get valid estimates. I prefer option 2) because models with a single y-channel conceptually just have one dependent variable and I find it more intuitive to think about this is as a 1D array. Option 1) would also mean that if the user passes a 1D x-array to the model, he would get a 2D y-array of shape (1, num_samples) out, which I don't like. Besides, if we ever started supporting two independent variables, we would we then have to internally promote everything to 3D arrays in models, which just seems unnecessary.

AUTProgram commented 11 months ago

@hartytp Feel free to have another look after latest updates, otherwise I'll merge today after lunch

hartytp commented 11 months ago

@AUTProgram modulo the small question about the arguments in evaluate I'm very happy for this to be merged.

OxIonics / ionics_fits

Change ordering of array for y-axis data #98