NVIDIA-Merlin / core

Core Utilities for NVIDIA Merlin
Apache License 2.0
19 stars 14 forks source link

Offer methods for Dataset to cover the most common mechanisms for moving data between partitions #245

Open karlhigley opened 1 year ago

karlhigley commented 1 year ago

The proposed methods would be shuffle_by_keys, sort_by_keys, and group_by_keys. Right now, we only have shuffle_by_keys.

@rjzamora says:

exposing a clear space for documentation is probably the best reason to add it. That documentation should also clarify that these global operations (requiring inter-partition data movement) should be avoided unless absolutely necessary :slightly_smiling_face: