Unfortunately this is pretty big. Had to change a couple of stuff to get FSDP working.
We now call an InferenceServer which serves the model predictions. It works for both FSDP and just one model per gpu style serving.
We aren't depending on huggingface's dataset builder anymore. Previously we were using it for multiprocessing + one model per process. But now we're use our inferenceserver which manages it. And you can't use multiprocessing to call our InferencecServer on separate processes.
Instead we just manually get the results and create our dataset ourselves.
Because of that, we need to DIY our own cache
And to make sure our workers in the inferenceserver are fully utilized, when calling the inference server we call it on multiple threads. The InferenceServer is designed to be threadsafe, so hopefully it works.
The min_gpu_mem can be passed. I indicates the memory for the whole model.
--min_gpu_mem {memory_required_for_whole_model}
mkl
You may encounter an error like this:
Github issue
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
To fix, run this before running. Still figuring out why this is happening. Its supposed to be fixed with the latest mkl package, but it aint for me.
export MKL_THREADING_LAYER=GNU
too many open files
Sometimes it'll complain about too many open files
inccrease the ulimit
ulimit -n 4048
❤️ QA instructions
checkout to this branch refactor-datasets-usage
Run elicit with huggyllama 7b with these variations.
For each of the runs, check that the eval.csv are roughly the same. and lmk if it crashes.
Note that we are disabling the cache for extracting here. Otherwise subsequent elicit runs won't actually run the extraction with llama, it will just reuse it.
Now compare this to the main branch. Does llama-7b take significantly slower?
If the above seems to work without crashing, and if you are feeling ambitious.
you can merge in the latest changes into this branchand fix the conflicts. may be confusing though.
Unfortunately this is pretty big. Had to change a couple of stuff to get FSDP working.
You can run it out with
Issues
Figuring out the memory required
The min_gpu_mem can be passed. I indicates the memory for the whole model.
mkl
You may encounter an error like this: Github issue
To fix, run this before running. Still figuring out why this is happening. Its supposed to be fixed with the latest mkl package, but it aint for me.
too many open files
Sometimes it'll complain about too many open files inccrease the ulimit
❤️ QA instructions
checkout to this branch
refactor-datasets-usage
Run elicit with huggyllama 7b with these variations. For each of the runs, check that the eval.csv are roughly the same. and lmk if it crashes. Note that we are disabling the cache for extracting here. Otherwise subsequent elicit runs won't actually run the extraction with llama, it will just reuse it.With fsdp. This shards the model on each device.
Without fsdp, but multigpu. This duplicates the model on each device
Now compare this to the main branch. Does llama-7b take significantly slower?
If the above seems to work without crashing, and if you are feeling ambitious. you can merge in the latest changes into this branchand fix the conflicts. may be confusing though.