This PR remove the nemo_curator/gpu_deduplication folder in favor of using all code from either the fuzzy_dedup module or fuzzy_dedup_utils. (A few methods were left behind).
It also moves the fuzzy dedup scripts into a new subfolder with a readme on the order of execution and example usage.
It adds a caution to the gpu_deduplication slurm example currently in examples which will be removed in a followup and replaced by a python only API example.
[X] Verified all tests passing
[X] Verified refactored scripts run w/ expected results locally
This PR remove the
nemo_curator/gpu_deduplication
folder in favor of using all code from either the fuzzy_dedup module or fuzzy_dedup_utils. (A few methods were left behind).It also moves the fuzzy dedup scripts into a new subfolder with a readme on the order of execution and example usage. It adds a caution to the gpu_deduplication slurm example currently in
examples
which will be removed in a followup and replaced by a python only API example.