Closed abstractqqq closed 7 months ago
I couldnt find the repo for dsds (pre-alpha). Although, I think most people will be focus on the data front-end (on the level of sklearn etc) more instead of the foundational algorithm, so ...
+1 for merging.
I could try to help a bit in refactoring too, although I have been much on the side of python-polars only.
I couldnt find the repo for dsds (pre-alpha). Although, I think most people will be focus on the data front-end (on the level of sklearn etc) more instead of the foundational algorithm, so ...
+1 for merging.
I could try to help a bit in refactoring too, although I have been much on the side of python-polars only.
DSDS is one of my older repos. I do not plan to maintain it any longer. Its functionalities will be merged to polars-ds bit by bit. My plan is to start doing it after v.4.0. I think I know how to write almost all traditional ML transformers in pure polars, and I have the infrastructure ready to support Polars-native dependency detection, data cleaning, data drifting, and other EDA tasks..
Should this package include features/functionalities that may not be directly related to Polars Extensions? Feel free to leave a comment.
E.g. functions that take in a Polars DataFrame, and spits out the SVD decomposition or data related to PCA. Functions that return eigenvalues, etc.
The main issue is that I have two polars-based packages for data science: polars-ds (alpha) and dsds (pre-alpha). DSDS provides data screening, data problem detection, feature selection, and transformers. On the other hand polars-ds right now only provides Polars extensions. Right now both dsds and polars-ds have Rust modules. This leads to problems like code duplication and makes Rust code harder to share between the two packages.
Pro:
Such an algorithm will involve a kdtree. But the same kdtree algorithm can be used for other tasks as well, which may not be DataFrame based. If we keep two Rust modules for polars-ds and dsds, then we will have to repeat the kd-tree implementations for all the use cases.
Cons: