allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.
https://allenai.github.io/dolma/
Apache License 2.0
972 stars 107 forks source link

Fixed issues and improved documentation in getting-started.md #216

Closed aman-17 closed 2 weeks ago

aman-17 commented 2 weeks ago

Updates:

  1. Added a note in getting-started.md to guide users on selecting wiki-dump dates.
  2. Resolved a "module not found" issue in make_wikipedia.py by incorporating try-except statements.
  3. Added the wikipedia-mixer.json file and updated the documentation to improve clarity and ease of use.