QuantConnect / Lean

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
https://lean.io
Apache License 2.0
9.55k stars 3.23k forks source link

Library Request: cuDF + RAPIDS #8318

Open AlexCatarino opened 2 weeks ago

AlexCatarino commented 2 weeks ago

cuDF (pronounced "KOO-dee-eff") is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

Test:

import cudf

tips_df = cudf.read_csv("https://github.com/plotly/datasets/raw/master/tips.csv")
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100

# display average tip by dining party size
print(tips_df.groupby("size").tip_percentage.mean())

Gives us:

No module named 'cudf'

EDIT: We need to install RAPIDS too.

Checklist

beckernick commented 1 week ago

Hi! I came across this issue due to the cuDF reference. I work on cuDF and other RAPIDS projects at NVIDIA.

In addition to being a GPU library, cuDF can provide zero code change GPU-acceleration for pandas and (as of yesterday) Polars.

%load_ext cudf.pandas # or via command line for Python scripts

df = pd.read_parquet(filepath)

(df[["Registration State", "Violation Description"]]
 .value_counts()
 .groupby("Registration State")
 .head()
 .sort_index()
)
import polars as pl

ldf = pl.LazyFrame({"a": [1.242, 1.535]})

print(
    ldf.select(
        pl.col("a").round(1)
    ).collect(engine="gpu")
)

Would love to see these capabilities available for LEAN users. Happy to try to help answer any questions that might come up if you or anyone else explores this.