NVIDIA / NeMo-Curator

Scalable toolkit for data curation
Apache License 2.0
329 stars 32 forks source link

[FEA] Support dask query planning #73

Open ayushdg opened 1 month ago

ayushdg commented 1 month ago

Is your feature request related to a problem? Please describe. Currently many functionalities/tests do not work when dask query planning is enabled (Default dask behavior).

This is an issue to track the gaps for query planning to work with Curator

rjzamora commented 3 weeks ago

@ayushdg - I'd like to start working on this. Do you want to support both query-planning "on" and "off" moving forward? It may be a bit hard to do this without a bunch of compatibly code.

Also, note that melt should now be supported with cudf-2406.