JuliaRheology / RHEOS.jl

RHEOS - Open Source Rheology data analysis software
MIT License
39 stars 9 forks source link

dev #148

Closed fr293 closed 2 years ago

fr293 commented 3 years ago

modified the input type for weights in modelfit to accept Int64, which Python can produce. N.b. I'm unsure if this is the best way to go about fixing this, but it works for me.

fr293 commented 3 years ago
def weight_function(data_smooth, time_period):

    # takes the smoothed data array and returns a weighting array of equal length

    tail_proportion = 0.1

    weight_slope = 1


    measured_time = rh.gettime(data_smooth)

    time_size = measured_time.size

    # calculate the base proportion excluding the initial 10 second experiment startup period

    startup_length = 10.0 / time_period

    base_proportion = np.int(np.ceil(((time_size - startup_length)*(1-tail_proportion)) + startup_length))

    addendum_proportion = time_size - base_proportion

    #julia uses 1 indexing

    weights_base = np.arange(base_proportion) + 1

    addendum_multiplier = np.round((weight_slope * np.arange(addendum_proportion)) + 1.0)

    addendum_multiplier = addendum_multiplier.astype(int)

    index = base_proportion + 1

    for i in addendum_multiplier:

        weights_addendum = np.ones(i) * index

        weights_base = np.concatenate((weights_base, weights_addendum))

        index = index + 1

    weights_int = weights_base.astype(int)

    weights_list = weights_int.tolist()


    return weights_list
fr293 commented 3 years ago

The above is currently the method that is used to generate the weighting, but you can use anything, including something as simple as weights = rh.indexweight(data, elperiods = [...], time_boundaries = [...]) and it will still throw an error for the input type of the weights. I'm not completely sure how the Integer type works in Julia, but from my understanding, it should accept anything that is an integer subtype, such as Int64, so it's not clear why my bodge works.

akabla commented 3 years ago

Thanks Fergus. There is a problem with the way the type was defined in the function. Arrays should not really be defined as we did as the type hierarchy is not passed implicitly to the containers (see below). I think I fixed the problem with the second commit. Please try it out. I don't think this had anything to do with python, it's just that you picked it up.

julia> Vector{Int64} <: Vector{Integer}
false

julia> Int64 <: Integer
true
fr293 commented 3 years ago

This version fixes the earlier merge error and includes a fix for the type error encountered when passing a vector of integers to the weight input of the function.

akabla commented 3 years ago

Please check it works for you. I'd be happy to merge with the repos' dev branch afterwards.

fr293 commented 3 years ago

everything appears to work as intended for me; weight arrays are accepted and timeouts work correctly. I think the best thing to do before merging would be to convert modelstepfit to the same timeout behaviour as well, as they are twin functions.

moustachio-belvedere commented 3 years ago

The functionality is tested (see units tests) and works fine in Julia-only world. However, yes it needs to be written in a more generic way to fully enable multiple dispatch and so work with Python, for the reason Alexandre mentioned in the code example.

If you want timeouts on modelfit, and as you mentioned @fr293, on modelstepfit, you might want to add them on the frequency domain fitting functions as well.

fr293 commented 3 years ago

@moustachio-belvedere that's a good spot! I'll replicate the implementation of timeouts to all four fitting functions.

akabla commented 3 years ago

Thanks @moustachio-belvedere @ab2425 mentioned that she used maxeval rather than maxtime to forcefully stop the optimiser. It may be best to ensure that the process is reproducible. Otherwise different machines and different runs would give different outcomes. I'd suggest to implement both, and use maxeval where possible.

fr293 commented 3 years ago

@akabla we can implement both, but naturally only the first to be satisfied will have any effect on the computation!

From my perspective, when I'm setting a maxtime, I'm looking at reliability of process, rather than reproducibility of outcome. I think these are slightly different use cases; I would only use the outcome of a time-out optimisation as the input for further optimisation. By construction, something has gone wrong in the optimisation of a time-out optimisation in a way that is not true for a maximum evaluation terminated optimisation.

akabla commented 3 years ago

The functionality is tested (see units tests) and works fine in Julia-only world. However, yes it needs to be written in a more generic way to fully enable multiple dispatch and so work with Python, for the reason Alexandre mentioned in the code example.

Hi @moustachio-belvedere I'm not sure what was causing the problem, but even within Julia alone there were problems looming. I can't really understand how the Integer works.

I think tests are OK because we cast the weights to the type Integer: modelout = modelfit(data0, model, stress_imposed, p0=init_params, lo=(α=0.5, β=0.2), hi=(α=1.5, β=1.5), weights=collect(Integer, 1:length(t)))

The function indexweight also forces the type of the output to Integer: indices = Integer[] That is not intuitive, and it does matter for some reason:

julia> collect(Integer, 1:3)
3-element Vector{Integer}:
 1
 2
 3

julia> collect(1:3)
3-element Vector{Int64}:
 1
 2
 3

Functions that take Vector{Integer} don't accept Vector{Int64}:

julia> f(x::Vector{Integer}) = sum(x)
f (generic function with 1 method)

julia> f([1,2])
ERROR: MethodError: no method matching f(::Vector{Int64})
Closest candidates are:
  f(::Vector{Integer}) at REPL[7]:1

What is really confusing is that scalars are behaving differently.

julia> a=1::Integer
1

julia> typeof(a)
Int64

julia> g(x::Integer)=2x
g (generic function with 1 method)

julia> g(2)
4

Weird.

moustachio-belvedere commented 3 years ago

even within Julia alone there were problems looming

No, because the functionality is designed specifically to be used with indexweight which returns a Vector{Integer}, as you mentioned. The tests that do the casting are meant to test specific bits of functionality unconnected to indexweight which is why the casts were necessary. See other tests that actually use indexweight for comparison - those tests do not do any casting.

I agree with you that's not best practice, and the reason for that best practice has now clearly raised its head with the Python interface. As you have done, parameterising the input signature weights::Vector{T} where T<:Integer for modelfit is the way to go.

As for indexweight line https://github.com/JuliaRheology/RHEOS.jl/blob/master/src/processing.jl#L88 : Integer[] could be changed to Int[], or even UInt[] - that was the behaviour I intended - to let Julia decide on the appropriate integer type for a given system. See https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Integers , with particular reference to:

Julia also defines the types Int and UInt, which are aliases for the system's signed and unsigned native integer types respectively

@fr293 regarding maxtime vs maxeval, bear in mind that some fractional models take an order of magnitude more time per iteration than non-fractional models, so if you are fitting both fractional and non-fractional models to data programatically, it may not be useful to specify a global maximum fitting time (though a maximum number of iterations would be in that case, edited in case it wasn't clear)

fr293 commented 3 years ago

@moustachio-belvedere I'm only using a maxtime parameter to put a hard limit on the optimisation that corresponds to how long I'm actually willing to wait for it to resolve before it moves on (safely). That timescale is determined independently of what is being optimised; it's much more to do with me than the function! It's a useful parameter to have where I am batch automating many optimisations, and one happens to have some aspects that the optimiser gets stuck on. It's not convenient for me to estimate how many function evaluations will elapse during my 'boredom' timescale, and as you point out, that value may depend strongly on the details of the objective function. I agree that it makes more sense to use maxeval in the normal way you might use a termination condition, but this case is more of an automated forced stop of an anomalous process to allow the rest of the program to proceed.

fr293 commented 2 years ago

Bar the immediate emergence of bugs, or any objections to my solutions, this feature extension is finished!

fr293 commented 2 years ago

uh-oh...

JULIA: UndefVarError: rel_tol not defined Stacktrace:[1] leastsquares_init(params_init::Vector{Float64}, low_bounds::Vector{Float64}, hi_bounds::Vector{Float64}, modulus::FunctionWrappers.FunctionWrapper{Vector{Float64}, Tuple{Vector{Float64}, Vector{Float64}}}, time_series::Vector{Float64}, dt::Float64, prescribed_dot::Vector{Float64}, measured::Vector{Float64}; insight::Bool, constant_sampling::Bool, singularity::Bool, _rel_tol::Nothing, _rel_tol_f::Float64, indweights::Nothing, optmethod::Symbol, opttimeout::Int64, optmaxeval::Nothing)

@akabla are you sure that this is correct for the intended behaviour? rel_tol_f::Union{Real,Nothing} = nothing, rel_tol::Union{Real,Nothing} = isnothing(rel_tol_f) ? 1e-4 : nothing

fr293 commented 2 years ago

I've corrected the typos that just emerged, but this leaves a wider issue: there's now a lot of name shadowing going on between model(step)fit and their helper functions leastsquares(step)init. This was dealt with using the underscore convention to denote the private variable name _tol_rel in the helper functions, but it was only done for that variable. I've stripped them all off until a consistent style to avoid shadowing can be decided.

akabla commented 2 years ago

Variable shadowing should not be too much of an issue since they mean the same thing and have the same value. They are simply passed from one function to the next. As you said, we did not do it for the other variables, so we'd rather avoid it for these two for now. Either way is fine though.

fr293 commented 2 years ago

The code is now working, though not as expected: ftol is set as intended, but sometimes the optimiser finishes on an xtol flag when no argument is passed to the xtol kwarg, so something funny is going on.

I'm too fried to work out what this is at the moment; it's probably something trivial

moustachio-belvedere commented 2 years ago

Is this OK to merge? Test failure - more specifically:

https://github.com/JuliaRheology/RHEOS.jl/runs/3701445556#step:7:271

got unsupported keyword argument "rel_tol"

akabla commented 2 years ago

I know. I prefered to fix this directly on the rheos repo's dev branch. Thanks for checking!