NVIDIA / NeMo-Curator

Scalable toolkit for data curation
Apache License 2.0
329 stars 32 forks source link

[FEA] Add examples showing how to use both CPU & GPU modules together #65

Open ayushdg opened 1 month ago

ayushdg commented 1 month ago

Is your feature request related to a problem? Please describe. The codebase has some tutorials/examples showcasing CPU only or GPU only modules, but not both. It would be good to have examples that show using both and how users can convert their dataset to go between using CPU & GPU modules.

Describe the solution you'd like This came up when trying to combine fuzzy dedup with cpu modules leading to a typeError expected data of type cudf.

Describe alternatives you've considered Longer term there should be means of handling the conversion automatically but for the time being an example showing how users can go between the two is good.

glam621 commented 1 month ago

Add to readme, link directly to the examples of using both GPU and CPU. Recent tutorial has both.