aws-neuron / neuronx-nemo-megatron

32 stars 11 forks source link

Add support for grouped-query attention #9

Open yasuhisa-nakashima opened 1 year ago

yasuhisa-nakashima commented 1 year ago

Add support for grouped-query attention for Llama 2 70B and Code Llama 34B compatibility.

References: