Train mem overhaul - Githubissues

Rework the scripts folder completely Have folders for llava_v1, llava_v1.5, robin_v1, robin_v2 and evals In robin_v2 have a folder for each cluster with install, pretrain, finetune script (include cedar and frontier folders)

Use of train_mem.py : when doing multinode training, environment variables are not properly set by the launch script (set them on main node but not the others) As train_mem is run on every node this sets the variable properly.

Once the above reorganization is done: split train_mem into a seperate file for each cluster and put it in that cluster's folder

CERC-AAI / Robin

Train mem overhaul #23