NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
743 stars 48 forks source link

Fix paul grahams order #55

Closed Wangmerlyn closed 3 months ago

Wangmerlyn commented 3 months ago

The original paul grahams essay dataset concatenate implementation uses glob.glob to list all files, which is not order-preserving, could lead to different dataset files on different machine or even differen directories. This fix uses sorted here to make sure the dataset file remains the same in different environments.

hsiehjackson commented 3 months ago

Thanks! I didn't notice this may affect the reproducibility.