Closed julio-34727 closed 1 month ago
Thank you for the request!
@CangyuanLi is working on something like this for this package, and changes will be gradually merged in, and this is the first PR in this direction. https://github.com/abstractqqq/polars_ds_extension/pull/150
For snake case, polars_ds right now already has a function called to_snake_case.
import polars_ds as pds
df.select(pds.to_snake_case("column_name"))
should work.
For URL related stuff, I actually have a second project called polars_istr, which is for Identification String parsing, which aims to help with common standard format strings parsing tasks. Take a look here
v0.4.5 should have the changes @CangyuanLi added, which partially addressed this issue
Thank for your excellent plugin.
Is it possible to add an expression in the
str
namespace to clean up some text. I already use this expression in python by combining the polars expressions, duckdb (for accents) and pyarrow (for normalization) but it would be interesting to have it in Rust without going through the different libraries.The idea is to remove for example emojis (if emoji=True), accents (if accent=True), fill_na (replace empty strings and r"\s+") and so on...
case = "snake" is not necessary (bonus) Exemple of mappings:
mappings = [(r"[\x00\u200d]+", ""), (r"[\xa0\x0b\u200e\n\r\t\f]+", " "), (r"\s\s+", " ")]