Data2Dynamics / d2d

a modeling environment tailored to parameter estimation in dynamical systems
https://github.com/Data2Dynamics/d2d
57 stars 29 forks source link

Best way to use known data as input #96

Closed t-schmitt closed 7 years ago

t-schmitt commented 7 years ago

Hi everyone,

I'm trying to use the D2D framework to assess the identifiability and observability of mechanical nonlinear systems. Therefore, I don't have simple input functions as steps, but rather arbitrary input data. Now, what is the best way to use this data as input? In the model.def, I sure can just call a "parameter" u_data, i. e. INPUTS u1 C "au" "conc." "u1_data" and insert a column "u1_data" in my data.csv. However, for two inputs and a data set of >1000, it requires tremendous computing capacity and the plots created by arPlot contain markers for every combination of two different u1_data and u2_data, which means up to 10e6 points.

So, is there any more elegant way to use data instead of an function as input?

Thanks in advance, Thomas

JoepVanlier commented 7 years ago

Hi Thomas,

Is this an algebraic system or a dynamic one that requires simulation?

If these are indeed separate dynamic experiments with different inputs, then I doubt there is a way around the computational load other than thinning the data in such a way that the important features of the experiments are still represented.

For issue 2, what you could do to reduce the number of plots is define one of the independents as a dose response curve. Look at PREDICTOR-DOSERESPONSE in https://github.com/Data2Dynamics/d2d/wiki/Setting up models . This will at least group them by one of the inputs.

Best, Joep.

t-schmitt commented 7 years ago

Hi Joep,

thanks for your response. It is a dynamic system which requires simulation. I've tried thinning the data, but obviously it results in a less accurate simulation.

My current solution is describing the input by step-functions between each data point, i. e. for t = [0 1 2]; u = [10 12 14]; I define the input as "step2(t, 0, 0, 10, 1, 0) + step2(t, 0, 1, 12, 2, 0) +step1(t, 0, 2, 14)", which reduces the computational costs reasonably. However, I guess it is only valid for small enough time steps and my computer always crashed when using 3000 or more data points (for two inputs).

I'm thinking about defining my own function which should only result in a linear slope between two given points. According to 'Setting up models', it has to be defined in arInputFunctionsC.c. Plus, it seems like one has to define the derivative of the function, too. Unfortunately, I couldn't figure out yet where the additional input, e. g. int p_indes for dstep1, comes from?

JoepVanlier commented 7 years ago

Hey Thomas,

No prob. So from your answer it sounds like you have a time dependent input rather than N by N conditions. Is this correct?

In that case, I would recommend against the stepwise approach, since the discontinuities might introduce a lot of stiffness. If the input is reasonably smooth, maybe you can approximate the input by a spline? You could treat the input as data. Define a data def file which fits the coefficients of the spline to the input data and estimate the parameters of the input spline along with the other data. Or estimate the spline first (by fixing the model parameters), then fix the spline and estimate the model parameters. You'd have to consider where to place the knots yourself though.

There's an example of the use of spline function in the Examples directory: Swameye_PNAS2003. The example uses cubic splines, but there are multiple types of splines available. If cubic splines over- and undershoot too much, you may want to try monotonic splines, which guarantee that between knots, it either goes up or down, but doesn't change sign of the derivative. Monotonic splines would be pretty similar to the linear interpolation you describe. See Examples/ToyModels/Splines for some examples.

They currently only go up to 10 knots though, so you'd have to maybe string a few of them together if you want more than that. The monotonic splines should stay at their start / end value past their range, so you can combine several splines in one input function. Let me know if that's performant enough for your needs.

Best, Joep.

t-schmitt commented 7 years ago

Hey Joep,

thanks again for the quick response! It's great there is support for unexperienced users like me. Yes, that's correct. I do have a time independent input, but only have its values over time and not a mathematical expression.

I do get your point. I like the idea of using splines. However, I'd want to interpet every data point as a knot - so I have to combine several splines, just as you say.

The problem: The example in Examples/ToyModels/Splines shows that the monotinic splines unfortunately do not stay at their start / end value past their range. And if they did, I'd have to somehow respect that end value for the next spline, otherwise they would sum up incorrectly. (E. g. by adding negative step functions.)

It would be much easier if they had a value of 0 for every t > t_knot_last, then I could just put them together without discontinuities (at least none of first order).

So, I tried to define my own spline function mymonospline10 (and the derivative Dmymonospline10) in arInputFunctionsC.c (and .h) by copying and modifying the original monospline10. Actually, I only changed uout = seval(10, t, ts, us, b, c, d); to if (t < ts[0] || t > ts[9]) uout = 0; else uout = seval(10, t, ts, us, b, c, d); return(uout); (I did so for both mymonospline10 and Dmymonospline10.)

After figuring out that I need to delete all files in the 'Compiled' folder after adding c-code, it finally compiled without error. However, the result is the same as before: The spline does not "end" after its last knot, which should be at t = 37:

grafik

Any idea how to change that?

Thanks in advance, Thomas

JoepVanlier commented 7 years ago

Hey Thomas,

Ah yes, sorry! I forgot that in order for this to happen, you have to use the same parameter value for the first two knots, and the same parameter value for the last two knots like so: "monospline10(t, 0.0, knot1, 1.0, knot1, 2.0, knot2, 4.0, knot3, 6.0, knot4, 9.0, knot5, 13.0, knot6, 20.0, knot7, 26.0, knot8, 37.0, knot8)"

With the monotonic spline, I am pretty sure that guarantees that remain constant for the rest of the simulation (unless there are numerical issues). The spacing between the first two and last two shouldn't matter, as long as the time points are not the same. I think this would be nicer than the 0 if t < t_start and t > t_end option, since it would still guarantee the first order continuity.

I think that for the next spline, you could simply substract the last knot parameter of the previous spline. The only issue I foresee that this may give you is that certain knot parameters will need to be negative, but this is easily mitigated by subtracting an offset from the spline. This offset is preferable to simply letting the parameters go negative, since one can still fit in logarithmic space then (which usually has better performance).

I think that while a little bit hacky, this is a much easier solution than adding another spline definition (since you also need to propagate the sensitivities for this to work correctly in arCompileAll.m).

I added an example case which I think does what you need in /Examples/ToyModels/LongSplines. Have a look at commit 3beca24c86d19db7ce0ce250b81782fd2ae69318 . I think this does what you want?

I hope this solves the issue, but let me know whether performance is bearable. If not, there is another thing regarding splines that could be optimized in our framework itself, but I will likely not get round to this until mid August.

Best, Joep.

JoepVanlier commented 7 years ago

Just fyi, I just implemented something that should speed up the use of splines considerably.

If you set ar.config.turboSplines to 1 after arInit, but prior to arCompileAll, it should result in the splines being evaluated a lot faster. Consider it a beta feature for now though.

t-schmitt commented 7 years ago

Hey Joep,

thanks again for your support! Your approach does work if you let Matlab identify the necessary parameter knots. However, at least my Matlab/Computer didn't want to identify 1100 parameters :-) I think it's also possible to use the data points directly, but you'd need to substract the last point value of a monospline of all the knots of the following monospline (not only of the first two knots). I'm not entirely sure why, but this way my code did not compile. I thought it may be due to negative knots, but adjusting the knots didn't work either.

Anyway, I came up with a different approach: I multiply every monospline with a step2-function which is 0 outside of the spline-range and 1 for the spline-range, e. g. monospline10(t, 0, 10, 1, 11, 2, 12, 3, 13, 4, 14, 5, 15, 6, 16, 7, 17, 8, 18, 9, 19) * step2(t, 0, 0, 1, 9, 0) + monospline3(t, 9, 19, 10, 20, 11, 21) * step2(t, 0, 9, 1, 11, 0)

I wrote a script that generates the corresponding string for two vector inputs (t_data, u_data) automatically. At the end, I add a step1-function with the latest u-value to hold the input. I'm happy to share the code if anyone is interested.

Your turbo-spline adjustment sounds great. However, I can't observe an acceleration in compiling time. Is there anything else I have to initialize?

JoepVanlier commented 7 years ago

Hey Thomas,

No prob! Glad to help.

Just out of interest, do you still have the error/warning you got when you tried compiling?

If your approach works, then great! If you could share the code, I can include it in the LongSplines example in case someone else needs it at some point :)

I'm afraid the turbosplines would only affect run time speeds, not compile time speed. The difference is that they cache the coefficients. They should be about a factor of six faster. The compile time is probably dominated by the symbolic math involved in generating the derivatives for the sensitivity equations.

Is the compile time workable? If not, one thing that might be worth considering is to not compute the symbolic derivatives of the inputs (but that would mean you'd be unable to fit them). There is currently no way to do this in the current version, but I could implement a flag for it if necessary.

Best, Joep.

t-schmitt commented 7 years ago

Hi Joep,

no, I don't have the error I had before.

I uploaded the matlab function as a zip-File, you're very welcome to include it in an example.

I see. The compile time strongly depends on the amount of data points I use. For n = 500, it's about 20 s. For n = 4000, it's more like 200 s. Still, if I restrict the data points for first tests, it's workable. I am not fitting the inputs at the moment, but I'm not sure whether I'll have to do so later on in my work. So a flag for this option would be nice-to-have, but not really necessary.

By the way, for n = 500 the turbosplines work fine, i. e. there is no compiling problem. However, as soon as I increase the data points, e. g. n = 2000, I always get an error:

Unable to complete successfully.

Error in arCompile (line 431)
                mex('-c',verbose{:},mexopt{:},'-outdir',['./Compiled/'
                c_version_code '/'
                mexext '/'], ...

But since the turbosplines don't show a significant acceleration in my case, I probably just won't use them.

writeSplines.zip

JoepVanlier commented 7 years ago

Alright. I cobbled together a function after our meeting today.

You can try: inputSpline(t, 4, [0,1,2,3], [0,1,2,3]) where the first number indicates the number of time points (in this case 4), the first vector indicates time points and the second vector indicates y values. Only numeric values are allowed. Note that there is a cap at 10.000 points for now, but if you want this higher, you can chaange the value for MAX_LONG_SPLINE in monotone.c to something that suits your problem.

Currently, the caching for this function is not in place, so use it only with turboSplines off. I will reply once more once I add that too.

Best, Joep.

JoepVanlier commented 7 years ago

Alright, I renamed it to inputspline (no more capitalization of the s) and added two examples which show a 5000 knot spline (inputTest and inputTest_fast) in /Examples/ToyModels/Splines. Note that the fast version uses turboSplines and is a factor of 3 faster at runtime and during fitting (compile time being roughly the same) on my pc.

JoepVanlier commented 7 years ago

With the implementation of the new inputSpline, I'm marking this one as solved.