JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
48 stars 11 forks source link

Our new fluctuation complexity generalization is incorrect. #410

Open Datseris opened 3 weeks ago

Datseris commented 3 weeks ago

in the new fluctuation complexity, #409 , we have the equation:

image

this equation is only compatible with the shannon entropy. Only the shannon entropy defines as -lop(p) the "unit" of information, and shannon entropy is just the weighted average of the information. WIth other information measures the equation of the fluctuation complexity simply doesn't make as much sense, because they don't define as -log(p) the unit of information.

I argue we should revert the measure to be a complexity measure instead, and keep the note at the end that this can be generalized but one needs to come up or provide appropriate ways so that I_i makes sense in the equation of fluctuation complexity.

kahaaga commented 3 weeks ago

I argue we should revert the measure to be a complexity measure instead, and keep the note at the end that this can be generalized but one needs to come up or provide appropriate ways so that I_i makes sense in the equation of fluctuation complexity.

Generalized or not, FluctuationComplexity should remain an information measure, not a complexity measure, because it is still just a functional of a PMF, and can be estimated just as any of the other information measures using outcome spaces, probabilities estimators and generic information estimators.

This equation is only compatible with the shannon entropy. Only the shannon entropy defines as -lop(p) the "unit" of information, and shannon entropy is just the weighted average of the information. WIth other information measures the equation of the fluctuation complexity simply doesn't make as much sense

The equation is compatible with anything. If you use anything other than Shannon entropy, it is a deviation of the Shannon information around some other summary statistic. This is just as valid as an information statistic as any other.

If one insists that a fluctuation measure - on a general basis - must compare X-type information to X-type weighted averages, then sure, the generalization does not make sense. But neither the measure description, nor the implementation, makes any such demand. The docs are also explicit that the default inputs give you the original Shannon-type measure.

I'll think about it a bit and see if there are any obvious ways of generalizing, though, because it is a good point that one should match the "unit of information" to the selected measure in order for the measure to precisely respect the original intention.

kahaaga commented 3 weeks ago

Just quickly did some calculations. We can easily define e.g. Tsallis-type "self information" or "information content", analogous to the Shannon-type information content. The same goes for many of the other entropy types.

Perhaps a good middle ground here is just to explicitly find the "self information" expressions for each of the entropies, then use dispatch to produce a "correct"/measure specific deviation, depending on if one picks Tsallis/Renyi/Shannon or something else?

Datseris commented 3 weeks ago

Perhaps a good middle ground here is just to explicitly find the "self information" expressions for each of the entropies, then use dispatch to produce a "correct"/measure specific deviation, depending on if one picks Tsallis/Renyi/Shannon or something else?

Yes, but this sounds like a research paper to me. If someone published this shannon fluctuation information, someone can publish the generalization.

kahaaga commented 3 weeks ago

Yes, but this sounds like a research paper to me. If someone published this shannon fluctuation information, someone can publish the generalization.

But do we restrict measures implemented here to measures that have already been published? We've got a plethora of methods that do not appear in any journal as part of the package already.

Datseris commented 3 weeks ago

yeah, we don't, and it probably isn't too complex to extract the unit of information for each measure. I'm just saying that if you do, you might as well publish a short paper for it. Maybe we can get a BSc student to write a small paper about this, it appears like a low risk high reward project for a BSc student.

kahaaga commented 3 weeks ago

yeah, we don't, and it probably isn't too complex to extract the unit of information for each measure. I'm just saying that if you do, you might as well publish a short paper for it. Maybe we can get a BSc student to write a small paper about this, it appears like a low risk high reward project for a BSc student.

I totally agree; in fact, I already started a paper draft on Overleaf to keep my notes in one place 😁 Do you have any bachelor students in mind that may be interested? This is something that shouldn't take too much time: a few simple derivations, a few example applications, and corresponding dispatch in the code here for each generalized variant of the fluctuation complexity.

Datseris commented 3 weeks ago

I don't have any students yet. I hope to find soon. I will continue bugging the Exeter people to see how I can find more students. In the meantime, I'll promote more such projects in my website.

kahaaga commented 3 weeks ago

Ok, then I'll probably just write up the paper myself as soon as possible. If you want to have a read, give me a nod here, and I'll send you a link to the paper.