Closed Datseris closed 1 year ago
cc @kahaaga in this PR I will change ValueHistogram to have the encoding
directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.
cc @kahaaga in this PR I will change ValueHistogram to have the encoding directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.
Excellent.
@kahaaga this PR will also remove all near-duplicate code that handles explictly input x
that is a timeseries Vector{<:Real}
. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:
y = rand(SVector{1, Float64}, 100)
x = reinterpret(Float64, y) # behaves identically to Vector{Float64}
This PR will also make outcome_space
work for any binning. it will make the histogram size part of the bin encoding struct.
At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data. I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.
@kahaaga this PR will also remove all near-duplicate code that handles explictly input x that is a timeseries Vector{<:Real}. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:
That makes sense. Then, before merging this, we need to make sure that
Vector_or_Dataset
no no I think you misunderstood: You can still use timeserries as input. We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.
We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.
Ah, yes, I misunderstood. Then these sound like reasonable changes.
This PR will also make outcome_space work for any binning. it will make the histogram size part of the bin encoding struct. At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data.
Just to ensure that I understood this comment:
RectangularBinning
cannot be known in advance without providing input data. I.e. outcome_space(::RectangularBinning)
is not defined.RectangularBinning
is known when providing input data, because it is precisely the set of bins visited by the input data. I.e. outcome_space(x, ::RectangularBinning)
is well-defined.Is that what you meant?
I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.
I agree on recommending this.
Now ValueHistogram()
signature requires x
if the bininng isn't FixedBinning so that all outcome spaces are knwon. Its similar to how we do it for e.g. the spectral entropy.
@kahaaga I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.
I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.
Will do! Thanks for the effort.
@kahaaga I'll slowly try to catch up in the coming days. What's the status here? This wasn't merged in, but is there any blockers?
@Datseris Welcome back!
I've been majorly busy too, so I just haven't had time to look at this PR properly yet. Will try to do so tonight or tomorrow morning.
The status here is:
entropy_signature
branch (#168) (just switching argument orders of probabilities
and entropy
+ friends) is ready-ish. I think we both decided on the design there, so probably it can just be merged? I'm basing everything CausalityTools v2 on that branch, so would be nice to merge it sooner rather than later.main
. There's been quite a few changes to main
since I first made those PRs. In general:
EntropyEstimator
and ProbabilitiesEstimator
can be used to estimate higher-level things like (conditional) mutual information. However, I haven't prepared any of the two points above for review yet, because I want us to decide on the argument order, multiscale and spatiotemporal API before finalizing anything.
TLDR: 1) I will try to look at this PR tonight. 2) The outstanding PRs need to be resolved to ready the other packages.
should we expose the n_eps keyword to the API, and if so, maybe its time to give it a better name.
Maybe just call it tolerance
or tol
, and explain in the docstring what it means?
Okay, I've tried to have a look, but I'll need some more time to wrap my head around these changes. The tests do not pass with the current version of main
. This happens because encode_as_bin
and decode_as_bin
can't straight-forwardly be replaced with the new encode
(which gives an integer) in the source code for TransferOperator
. I think.
My mind is a bit clouded at the moment. I'll have another go at it tomorrow again with a blank slate.
@kahaaga would you be up for a video call as I still can't read many letters in one sitting and the reply is large here
@Datseris 13:30 CET, today?
just saw this now; I'm free from 14:30 CET
just saw this now; I'm free from 14:30 CET
@Datseris I'm busy with other things from 14:30 CET and onwards today. I can either do a call after 20 CET tonight, or any time from 10-17 CET tomorrow. Do any of those alternatives work for you?
Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!
Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!
Sounds good. Just tag me explicitly here when you're ready, so I'll get an e-mail notification.
@Datseris I'm available now. Just post the link when you're ready.
@kahaaga me too. I don't have any pro zoom stuff. I can post a google meets link.
@Datseris I have the pro version. Here's a Zoom meeting:
https://uib.zoom.us/j/66488837829?pwd=c3FCUFhZdjFjN0V5SkZ4UWJSd0o5dz09
@kahaaga shall we merge this in? Shall we merge in first #168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)
A lol #168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.
@kahaaga shall we merge this in? Shall we merge in first https://github.com/JuliaDynamics/Entropies.jl/pull/168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)
Yes, we should try to get a stable release out before the end of the year. I aim to do the same for CausalityTools (which I'm currently working on during my available work hours), so that we'll have it ready to prepare workshop material.
A lol https://github.com/JuliaDynamics/Entropies.jl/pull/168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.
Yes, do that! I have started fixing the transfer operator in a separate branch, but it isn't pushed to remote yet.
Test are passing here (for ValueHistogram). I am merging this in before my brain explodes due to the amount of PRs open with concurrent changes!
This PR achieves:
ValueHistogram
now requires that inputx
is provided if the binning is not fixed, so that outcome space is always well defined.Dataset, dimension
.I have not written explicit tests for encodings yet. But the current tests pass. To consider also: should we expose the
n_eps
keyword to the API, and if so, maybe its time to give it a better name.