Closed lorentzenchr closed 10 months ago
Do you think something like this in Plotly, just to get a zero-copy to pandas @lorentzenchr?
I see you have an Polars rewrite as a branch @jbogaardt - is converting to polars what you consider the future of the package? π
For some reason i didnt see this issue when @lorentzenchr created it. It is an worthwhile enhacement since thats the direction other tools seem to be going.
@johalnes, I did start the pl_tri branch to experiment with polars over a numpy/sparse backend. It is much simpler code, fewer dependencies is generally faster than the main branch, and can possibly be extended to other languages where the polars API has been implemented (R, node.js). There are some computations that are slower though. I think where arrays make more sense, polars might be slower that the current implementation, but for data manipulation it is much faster. Overall, I've been impressed with the speed + simplification polars brings. If I can get it to a point where there is minimal impact to the end user API and performance, then it probably is the right move.
Totally agree with you @jbogaardt! I do think the code get easier and more elegant with Polars. But quite a lot of work it would seem, even if it looks like you already have done a lot! Added an attempt at the interchange protocol, and luckily the pandas dev team have done most of the work for us π
Is there any more fundament changes you have considered? Since the package still isn't at 1.0, you could do some breaking changes and no one can arrest you for doing itπ Given more performant, future proof and readable code I would think most users would be quite happy!
@lorentzenchr - give it a try if you have the time!
To be clear, my proposal is just to support dataframes that support the dataframe interchange protocol in cl.Triangle
.
Replacing pandas as the internal computation backend is a totally different story that is better discussed separately.
I know @lorentzenchr, sorry for going off topic. Just got exited!
I think the pull request merged closes this issue? Or have I missed something?
Yes, we can close. Thanks for the fast PR.
Is your feature request related to a problem? Please describe. I would like to preprocess my data in other data containers than pandas, e.g. pyarrow and polars, and then apply
Triangle
to it.Is your feature request at odds with the scope of the package?
results in
Describe the solution you'd like Supporting data via the Python dataframe interchange protocol might be an optimal approach.
Describe alternatives you've considered One could also consider polars specific code path. But this might result in more maintenance burden: Where to stop? Add pyarrow, too?
Additional context A similar feature request is https://github.com/scikit-learn/scikit-learn/issues/25896 which is currently worked on.