JuliaRheology / RHEOS.jl

RHEOS - Open Source Rheology data analysis software
MIT License
40 stars 9 forks source link

Weights for modelfit could be more intuitive #149

Open akabla opened 3 years ago

akabla commented 3 years ago

The current format of the weights array is not very intuitive, as pointed out by @fr293 . It is not an array of weights, and is difficult to use without the helper function.

Why not have a proper array of weights, integer or floating points? It should not be too difficult to implement, and probably more efficient. This line in particular reallocates a whole array for each evaluation of the cost function: cost = sum((measured_weighted - convolved[weights]).^2)

A dot product of the weights vector and the squared difference would work well, possibly using a for loop so that no intermediate array is produced.

moustachio-belvedere commented 3 years ago

It is not an array of weights

Not in the traditional sense, but its behaviour is functionally equivalent to weights.

difficult to use without the helper function.

it's not intended for use without the helper function. In fact, I wouldn't even call it a helper function, it is the function intended for anyone who wants to weight their cost. This may be an issue of documentation rather than implementation. The only docs related to this, apart from the API, is the 'in progress' page at https://juliarheology.github.io/RHEOS.jl/stable/numerical/

Is the indexweight function itself unintuitive, or just the format of the weights?

A dot product of the weights vector and the squared difference would work well

Sounds good to me.

I'm not convinced that speed-up will be significant without further benchmarking, but if it is (subjectively) more intuitive to have an array of floats then I can't think of any drawbacks right now. However, bear in mind the history of why I ended up adding this approach, it was a direct workaround for the previous approach which was to resample. We discussed this in issue https://github.com/JuliaRheology/RHEOS.jl/issues/14 - would be good to also check with @ab2425 if there are reasons to do it the current way instead of the cost-float-multiply method you suggested.

akabla commented 3 years ago

Thanks @moustachio-belvedere . I did discuss it briefly with @ab2425 this morning. It was useful indeed to be reminded the background.

Speed up will of course depend a lot on the function that is evaluated, and differences may not be visible at all in practical situations. But considering the summation alone for calculating the cost, I'd expect a significant speed-up with a dot-product and less allocation. We could even reuse convolve as buffer for the cost data (even when not using the weights):

   convolved .= weights .* (measured_weighted .- convolved).^2
   cost = sum(convolved)

For sums, there are also higher precision methods if needed. Just placing this here for the record: https://github.com/JuliaMath/AccurateArithmetic.jl

I appreciate that weights were supposed to be used with the helper function, and I generally like the principle of it. But it's also nice to have something simple enough that people could design a weight array without too much background knowledge. I think we can get the best of both worlds! :-)

moustachio-belvedere commented 3 years ago

Yes, sounds good to me.

As you mentioned the higher accuracy sum, worth mentioning this thread I started a while back. One of the interesting bits of it for me was learning that the default sum in Julia is pairwise, which should cut down floating point error a fair amount vs. a standard + accumulation. A bunch of alternatives were suggested in thread also

https://discourse.julialang.org/t/julia-equivalent-of-pythons-fsum-for-floating-point-summation/17785/12

akabla commented 3 years ago

Thanks for sharing the link - interesting discussion. Yes, I found out earlier today (on the AccurateArithmetic.jl doc) that the default sum was quite good already. I'll never sum with a for loop again!

moustachio-belvedere commented 1 year ago

Adding the link to where the current implementation is documented https://juliarheology.github.io/RHEOS.jl/dev/numerical/ so any changes we make to the implementation we should try and update the docpage to reflect that

moustachio-belvedere commented 1 year ago

Although I don't expect any speed changed will be significant compared to things like MittagLeffler evaluations, it would be interesting to benchmark the custom indexing pattern Vs. N floating point multiplications. I would guess the multiplications will be faster but would be nice to confirm this.

Picking up something you mentioned in the opening comment:

Why not have a proper array of weights, integer or floating points?

If we do explore this option I think floating points will be best. The data type of the weights should be the same as the data type of the experimental data (RheoFloat) which they will multiply to avoid excess type casting.