issues
search
abstractqqq
/
polars_ds_extension
Polars extension for general data science use cases
MIT License
261
stars
17
forks
source link
String Cleaning
#152
Closed
CangyuanLi
closed
1 month ago
CangyuanLi
commented
1 month ago
Partially addresses #151. Adds:
remove_diacritics- strip diacritics (e.g. è -> e)
normalize_string- apply Unicode normalization
map_words- replace words with values. This is faster than using word boundaries in regex (\b\b)
normalize_whitespace- normalize whitespace to one, e.g. (a b -> a b)
replace_digits- replace digits with specified values
Partially addresses #151. Adds: