refactor of datautils - Githubissues

fixing bos, eos token ids when loading from decapoda-research llama models - this was cause by incorrect tokenized config and was causing errors with transformers >=4.29.0
combining old and new eval functions
separate loading of train and eval data using param:eval_mode (saves time)
combining args.dataset and args.custom_data_path into one option
loading parama and refinedweb using dataset option (since they both are included into this repo in a fixed location) tests available in the private chat

Vahe1994 / SpQR