JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
54 stars 12 forks source link

Encoding for binnings #177

Closed Datseris closed 1 year ago

Datseris commented 1 year ago

This PR achieves:

I have not written explicit tests for encodings yet. But the current tests pass. To consider also: should we expose the n_eps keyword to the API, and if so, maybe its time to give it a better name.

Datseris commented 1 year ago

cc @kahaaga in this PR I will change ValueHistogram to have the encoding directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.

kahaaga commented 1 year ago

cc @kahaaga in this PR I will change ValueHistogram to have the encoding directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.

Excellent.

Datseris commented 1 year ago

@kahaaga this PR will also remove all near-duplicate code that handles explictly input x that is a timeseries Vector{<:Real}. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:

y = rand(SVector{1, Float64}, 100)
x = reinterpret(Float64, y) # behaves identically to Vector{Float64}
Datseris commented 1 year ago

This PR will also make outcome_space work for any binning. it will make the histogram size part of the bin encoding struct.

At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data. I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.

kahaaga commented 1 year ago

@kahaaga this PR will also remove all near-duplicate code that handles explictly input x that is a timeseries Vector{<:Real}. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:

That makes sense. Then, before merging this, we need to make sure that

Datseris commented 1 year ago

no no I think you misunderstood: You can still use timeserries as input. We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.

kahaaga commented 1 year ago

We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.

Ah, yes, I misunderstood. Then these sound like reasonable changes.

kahaaga commented 1 year ago

This PR will also make outcome_space work for any binning. it will make the histogram size part of the bin encoding struct. At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data.

Just to ensure that I understood this comment:

Is that what you meant?

I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.

I agree on recommending this.

Datseris commented 1 year ago

Now ValueHistogram() signature requires x if the bininng isn't FixedBinning so that all outcome spaces are knwon. Its similar to how we do it for e.g. the spectral entropy.

Datseris commented 1 year ago

@kahaaga I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.

kahaaga commented 1 year ago

I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.

Will do! Thanks for the effort.

Datseris commented 1 year ago

@kahaaga I'll slowly try to catch up in the coming days. What's the status here? This wasn't merged in, but is there any blockers?

kahaaga commented 1 year ago

@Datseris Welcome back!

I've been majorly busy too, so I just haven't had time to look at this PR properly yet. Will try to do so tonight or tomorrow morning.

The status here is:

In general:

However, I haven't prepared any of the two points above for review yet, because I want us to decide on the argument order, multiscale and spatiotemporal API before finalizing anything.

TLDR: 1) I will try to look at this PR tonight. 2) The outstanding PRs need to be resolved to ready the other packages.

kahaaga commented 1 year ago

should we expose the n_eps keyword to the API, and if so, maybe its time to give it a better name.

Maybe just call it tolerance or tol, and explain in the docstring what it means?

kahaaga commented 1 year ago

Okay, I've tried to have a look, but I'll need some more time to wrap my head around these changes. The tests do not pass with the current version of main. This happens because encode_as_bin and decode_as_bin can't straight-forwardly be replaced with the new encode (which gives an integer) in the source code for TransferOperator. I think.

My mind is a bit clouded at the moment. I'll have another go at it tomorrow again with a blank slate.

Datseris commented 1 year ago

@kahaaga would you be up for a video call as I still can't read many letters in one sitting and the reply is large here

kahaaga commented 1 year ago

@Datseris 13:30 CET, today?

Datseris commented 1 year ago

just saw this now; I'm free from 14:30 CET

kahaaga commented 1 year ago

just saw this now; I'm free from 14:30 CET

@Datseris I'm busy with other things from 14:30 CET and onwards today. I can either do a call after 20 CET tonight, or any time from 10-17 CET tomorrow. Do any of those alternatives work for you?

Datseris commented 1 year ago

Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!

kahaaga commented 1 year ago

Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!

Sounds good. Just tag me explicitly here when you're ready, so I'll get an e-mail notification.

kahaaga commented 1 year ago

@Datseris I'm available now. Just post the link when you're ready.

Datseris commented 1 year ago

@kahaaga me too. I don't have any pro zoom stuff. I can post a google meets link.

kahaaga commented 1 year ago

@Datseris I have the pro version. Here's a Zoom meeting:

https://uib.zoom.us/j/66488837829?pwd=c3FCUFhZdjFjN0V5SkZ4UWJSd0o5dz09

Datseris commented 1 year ago

@kahaaga shall we merge this in? Shall we merge in first #168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)

Datseris commented 1 year ago

A lol #168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.

kahaaga commented 1 year ago

@kahaaga shall we merge this in? Shall we merge in first https://github.com/JuliaDynamics/Entropies.jl/pull/168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)

Yes, we should try to get a stable release out before the end of the year. I aim to do the same for CausalityTools (which I'm currently working on during my available work hours), so that we'll have it ready to prepare workshop material.

A lol https://github.com/JuliaDynamics/Entropies.jl/pull/168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.

Yes, do that! I have started fixing the transfer operator in a separate branch, but it isn't pushed to remote yet.

Datseris commented 1 year ago

Test are passing here (for ValueHistogram). I am merging this in before my brain explodes due to the amount of PRs open with concurrent changes!