Closed jianshu93 closed 7 months ago
Hello Team,
Where I can find the data before dedupilcating? I have similar tasks to test dedupilcation algorithms.
Thanks,
Jianshu
We tested our paper on open-source datasets: wiki-40b, C4, LM1B. You can find these datasets in, e.g., TFDS or hugging face.
Hello Team,
Where I can find the data before dedupilcating? I have similar tasks to test dedupilcation algorithms.
Thanks,
Jianshu