Closed remiadon closed 3 months ago
How are you doing? If you run into any environment problem let me know. I did change the version of Polars in recent updates
@abstractqqq my PR is ready appart from one bug I cannot get rid of ...
my copula_entropy
yields different results from its equivalent from the ref
I found out the rank
method I use within this function is giving different results. The only solution I could find requires 2 calls to select
, while the ideal solution should only require one
For example this
df.select(-knn_entropy(pl.col.log_return_vix.rank() / pl.len(), pl.col.log_return_sp500.rank() / pl.len(), k=3))
yields shape: (1, 1)
len |
---|
f64 |
4.03966 |
But this yields the desired result
df.select(pl.all().rank() / pl.len()).select(-knn_entropy(pl.col.log_return_vix, pl.col.log_return_sp500, k=3))
shape: (1, 1)
len |
---|
f64 |
0.390728 |
Do you have any idea why ? If not I think we should remove copula_entropy
and transfer_entropy
and keep knn_entropy
which is still very useful
What rank method does the algorithm use? (E.g does it skip ranks? Do same values have the same rank?) Does it match Polar's rank's default behavior?
What rank method does the algorithm use? (E.g does it skip ranks? Do same values have the same rank?) Does it match Polar's rank's default behavior?
@abstractqqq I don't think the difference lies in the rank method used by the ref (scipy's rank), but from the way the rank method from polars is submitted to the polars engine (in this case in a select
context, but it could be a different context)
If you look at my examples, the second snippet succeeds in finding the correct value while using polars's rank method
I am not able to merge your branch because of some conflicts and the conflicts were a headache.. I created a separate branch based on latest main.
After reviewing the paper, I also don't think this line is entirely correct. The gather statement seems to be the problem.
pl.sum_horizontal(pl.exclude('_idx').sub(pl.exclude('_idx').gather(neighbor)).pow(2)).sqrt()
I think the code in pl_knn_entropy is correct in my new branch. Please give it your review. It is saved in entrophies.rs. I copied the main logic from pycopent, but it now uses the implementation that saves more space as I described earlier. It passed the simple test you provided.
Can u try to move the rest of your code to that branch? Thanks a lot!
https://github.com/abstractqqq/polars_ds_extension/tree/knn_entropy
knn_entropy is merged into main. Feel free branch and PR to add the other entropies, and digamma to the project.
@abstractqqq I've updated my branch, reusing your knn_entropy implementation. Now all of my calls (knn_entropy, copula_entropy, conditional_independance, ...) results match with the reference :rocket: :rocket: :rocket:
From my POV this is ready for a merge