issues
search
bigcode-project
/
selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
https://arxiv.org/abs/2410.24198
Apache License 2.0
276
stars
20
forks
source link
Minihash
#8
Closed
UniverseFly
closed
6 months ago
UniverseFly
commented
6 months ago
@natedingyifeng
Args:
data_files: a list of JSONL data file paths (use
datasets.load_dataset
to load
output_file: the file to write deduplicated results to
key: the JSONL key used to dedup
hyperparameters for minihash dedup
?
@natedingyifeng
Args:
datasets.load_dataset
to load