Closed dacorvo closed 7 months ago
ack, asking compiler team to take a look
Hi dacorvo - We have root caused this issue with the llama2 batch 4 config. The fix will be in the next Neuron SDK release.
This is excellent news ! Thanks for the update.
This issue was fixed in Neuron SDK release 2.15.
With AWS Neuron SDK
2.14.1
, I am experiencing very long compilation times forbatch_size = 4
with the llama2 7B model.I am using the following configurations:
With
batch_size = 1, 2
it takes minutes to compile the model with the-O1
option, but withbatch_size = 4
it lasts more than three hours.