NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
478 stars 57 forks source link

Fix indexing in PII Modifier #55

Closed ryantwolf closed 4 months ago

ryantwolf commented 4 months ago

Fixes an issue reported by @Maghoumi. When the PII Modifier takes in a partition that has been partially filtered, it does not respect the subset of the partition that was given to it. It overrides the index, and the modifications get applied to elements they were not supposed to.