Open Neo9061 opened 9 months ago
This is a compilation error, so technically not optimum-neuron
related.
However, I would suggest you try with a lower batch size, as even for batch size 6 and llama 2 13b
I get OOM errors.
Maybe start with batch_size 1 first, then 2 and finally 4.
@dacorvo Thanks for suggesting smaller batch sizes!
@Neo9061 We will look at the specific sizing issue here to resolve the compilation error. This will likely only be available in a future release, so for now using a smaller batch size may be the best option.
One alternative you can try is to use the -O1
flag (synonym of --optlevel 1
) which can often times allow you to compile larger models with some penalty to latency. (Compiler Option Reference). There was a known issue with prior versions of llama but it appears to also affect this configuration.
@Neo9061 does the -O1
compilation flag help with your issue?
I started a
inf2.48xlarge
ec2, pull and get into TGI-Neuron DLC with optimum-neuron 0.0.17 installed, and running following code.It gives me error after more than 1 hours' compilation. Can anyone give instruction? many thanks!