Open elephantpanda opened 1 year ago
As per this: Flex Gen and this: Big Models
This is a way to run really large models by splitting the model up into small pieces and only putting a piece of the model on the GPU at one time.
This would be a very useful thing for Barracuda to implement especially if we want it to work on lower end hardware.
As per this: Flex Gen and this: Big Models
This is a way to run really large models by splitting the model up into small pieces and only putting a piece of the model on the GPU at one time.
This would be a very useful thing for Barracuda to implement especially if we want it to work on lower end hardware.