AlignmentResearch / tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer
https://tuned-lens.readthedocs.io/en/latest/
MIT License
438 stars 47 forks source link

CPU offload option #46

Closed norabelrose closed 1 year ago

norabelrose commented 1 year ago

Before, when you passed in --fsdp it would automatically also do CPU offload, but this makes it so you have to pass in --cpu-offload to get that behavior. When training a probe for LLaMA 13B I noticed CPU offload was sort of unnecessary and was probably slowing things down but FSDP was needed.