NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
589 stars 81 forks source link

Expand RMM options for Python API #260

Closed sarahyurick closed 1 month ago

sarahyurick commented 1 month ago

In https://github.com/NVIDIA/NeMo-Curator/pull/244, we suggest flags like --rmm-async and --rmm-release-threshold 50GB to users dealing with GPU OOM issues. Right now, the user can only set them if they are initializing the Dask client themselves.

We should expand get_client to be more flexible and handle this logic.

sarahyurick commented 1 month ago

API reference: https://docs.rapids.ai/api/dask-cuda/nightly/api/