Move tokenizers to new `olmo_data` package.

Issue: Tokenizer.from_checkpoint assumes that tokenizers are in HF or in a path relative to the current directory. This assumption doesn't hold when OLMo is install from pip as ai2-olmo. More generally, we don't have a clear mechanism for putting data files in our repo.

Fix: This PR creates an olmo_data package, of which the subdirectories can correspond to various types of data (e.g. tokenizers and hf_datasets). Tokenizers are moved under olmo_data, and Tokenizer.from_checkpoint is updated to look at local paths, then olmo_data, then HF Hub.

This change sets up the foundations for adding HF datasets to our repo, so that we don't have to make network calls during training runs.

Fixes #633

allenai / OLMo

Move tokenizers to new `olmo_data` package. #645