Open AlexCatarino opened 2 months ago
Hi! I came across this issue due to the cuDF reference. I work on cuDF and other RAPIDS projects at NVIDIA.
In addition to being a GPU library, cuDF can provide zero code change GPU-acceleration for pandas and (as of yesterday) Polars.
%load_ext cudf.pandas # or via command line for Python scripts
df = pd.read_parquet(filepath)
(df[["Registration State", "Violation Description"]]
.value_counts()
.groupby("Registration State")
.head()
.sort_index()
)
import polars as pl
ldf = pl.LazyFrame({"a": [1.242, 1.535]})
print(
ldf.select(
pl.col("a").round(1)
).collect(engine="gpu")
)
Would love to see these capabilities available for LEAN users. Happy to try to help answer any questions that might come up if you or anyone else explores this.
cuDF (pronounced "KOO-dee-eff") is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
Test:
Gives us:
EDIT: We need to install RAPIDS too.
Checklist
master
branch