abstractqqq / polars_ds_extension

Polars extension for general data science use cases
MIT License
266 stars 18 forks source link

NumExt : add Transfer Entropy and related measures #84

Closed remiadon closed 3 months ago

remiadon commented 4 months ago
abstractqqq commented 3 months ago

How are you doing? If you run into any environment problem let me know. I did change the version of Polars in recent updates

remiadon commented 3 months ago

@abstractqqq my PR is ready appart from one bug I cannot get rid of ... my copula_entropyyields different results from its equivalent from the ref I found out the rankmethod I use within this function is giving different results. The only solution I could find requires 2 calls to select, while the ideal solution should only require one

For example this

df.select(-knn_entropy(pl.col.log_return_vix.rank() / pl.len(), pl.col.log_return_sp500.rank() / pl.len(), k=3))

yields shape: (1, 1)

len
f64
4.03966

But this yields the desired result

df.select(pl.all().rank() / pl.len()).select(-knn_entropy(pl.col.log_return_vix, pl.col.log_return_sp500, k=3))

shape: (1, 1)

len
f64
0.390728

Do you have any idea why ? If not I think we should remove copula_entropy and transfer_entropyand keep knn_entropywhich is still very useful

abstractqqq commented 3 months ago

What rank method does the algorithm use? (E.g does it skip ranks? Do same values have the same rank?) Does it match Polar's rank's default behavior?

remiadon commented 3 months ago

What rank method does the algorithm use? (E.g does it skip ranks? Do same values have the same rank?) Does it match Polar's rank's default behavior?

@abstractqqq I don't think the difference lies in the rank method used by the ref (scipy's rank), but from the way the rank method from polars is submitted to the polars engine (in this case in a selectcontext, but it could be a different context) If you look at my examples, the second snippet succeeds in finding the correct value while using polars's rank method

abstractqqq commented 3 months ago

I am not able to merge your branch because of some conflicts and the conflicts were a headache.. I created a separate branch based on latest main.

After reviewing the paper, I also don't think this line is entirely correct. The gather statement seems to be the problem.

pl.sum_horizontal(pl.exclude('_idx').sub(pl.exclude('_idx').gather(neighbor)).pow(2)).sqrt()

I think the code in pl_knn_entropy is correct in my new branch. Please give it your review. It is saved in entrophies.rs. I copied the main logic from pycopent, but it now uses the implementation that saves more space as I described earlier. It passed the simple test you provided.

Can u try to move the rest of your code to that branch? Thanks a lot!

https://github.com/abstractqqq/polars_ds_extension/tree/knn_entropy

abstractqqq commented 3 months ago

knn_entropy is merged into main. Feel free branch and PR to add the other entropies, and digamma to the project.

remiadon commented 3 months ago

@abstractqqq I've updated my branch, reusing your knn_entropy implementation. Now all of my calls (knn_entropy, copula_entropy, conditional_independance, ...) results match with the reference :rocket: :rocket: :rocket:

From my POV this is ready for a merge