allenai / wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Apache License 2.0
172 stars 18 forks source link

search index for falcon-refinedweb #12

Closed WilliamsToTo closed 2 weeks ago

WilliamsToTo commented 4 months ago

Do you plan to build a search index for the falcon-refinedweb dataset? This pre-trained dataset supports the falcon series LLMs, which are notable for their open access to both the model and dataset. Creating a search index could enhance our understanding of LLMs.

yanaiela commented 2 weeks ago

Sorry I missed that. Probably not.