fix dataset download for its tokenization

The script for tokenizing datasets from Huggingface currently uses a function that downloads the dataset stories dataset from the 'delphi-suite' namespace. It only downloads one split (validation) split and uploads it as the 'train' split.

[ ] Use the hugginface native dataset download function instead of 'delphi-suite/stories'-specific data downloader
[ ] Download all dataset splits, tokenize each split, upload each tokenized split
[ ] Optional: save tokenized dataset locally

@siwei-li I would ask you to review this, when I am done.

delphi-suite / delphi

fix dataset download for its tokenization #105