Open liuyxpp opened 5 months ago
Hey, @liuyxpp! Thanks for the report.
Can you report back the output of Pkg.status()
in the same session as you're testing, so I know which versions of packages you're using?
@time
macro in Julia isn't the optimal way of testing real-world performance for snippets of code. To ensure that the code is actually slower than the Python implementation, can you please use the @btime
macro from BenchmarkTools.jl to check again? Make sure to run your code once before testing, so that pre-compilation kicks in. Like so:using Random; rng = MersenneTwister(1234)
using CausalityTools, BenchmarkTools
x = rand(rng, 10000)
y = rand(rng, 10000)
mutualinfo(KSG1(), x, y) # run once for precompilation
@btime mutualinfo(KSG1(), $x, $y) # now time it
If the performance gain is also evident after testing properly with @btime
, then the difference might be due to implementation differences.
@liuyxpp Also: are you running also sort of multithreading for the Python code?
It would also be nice to see whether this apparent performance difference is also true for higher dimensional data. In the Ross paper which is linked in the python docs, they use a specialized fast implementation for 1D data. Could you try it out with two large 3D datasets, for example?
In Python + scikit-learn
In Julia + CausalityTools
The results are clearly different. In addition, in scikit-learn, if mi<0, 0 will be reported.
For running time, I have
In Python
In Julia
So, Julia implementation is much slower than Python implementation.