cyzhh / MMOS

Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math reasoning.
69 stars 3 forks source link

Which code is used for Deduplication Algorithm? #6

Closed hanstong closed 3 months ago

hanstong commented 3 months ago

Hi, The dataset relys largely on the deduplication method, which code is used for the Deduplication Algorithm 1? Thanks

Zui-C commented 3 months ago

Hi, The dataset relys largely on the deduplication method, which code is used for the Deduplication Algorithm 1? Thanks

Hi, The main part is in nodup.py. Recommend referring to generate.sh, including: 1. combine the result 2. extract true cases 3. dedup