artificialwisdomai / origin

Artificial Wisdomâ„¢ Cloud Platform
Apache License 2.0
2 stars 4 forks source link

workloads/dataset-controller: Out-of-memory on large datasets #122

Open MostAwesomeDude opened 9 months ago

MostAwesomeDude commented 9 months ago

We currently use HF Datasets to load datasets from HF Hub. Their recommended method requires the entire dataset to fit in memory. If not, then our dataset controller will likely run out of memory and crash.

This hasn't been observed yet, but is considered inevitable.