NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
596 stars 81 forks source link

Deprecate `max_text_bytes_per_part` #331

Open sarahyurick opened 2 weeks ago

sarahyurick commented 2 weeks ago

We have long strings support in cuDF now, so we can deprecate the max_text_bytes_per_part parameter.

Related:

sarahyurick commented 2 weeks ago

Will work on this after https://github.com/NVIDIA/NeMo-Curator/pull/316 is resolved.