Closed acheronw closed 1 year ago
@acheronw Please go through the unresolved threads and resolve them if you think you have addressed the associated comments. GitHub won't allow the merge if any remains.
The main outstanding issue is the that we also need the script that actually runs the whole cumulative deduplication.
Wrote a script that deduplicates (using the minhashes) against every earlier batch (treating directory names as dates).