Closed 9tong closed 1 year ago
decode_activation_memory_per_gpu should be equal to decode_activation_memory_per_layer * num_layers_per_gpu
@9tong thanks for the PR. Was out for a few weeks. For inference, we would like to reuse the tensor memory across layers, thus it's not multiplied by the number of layers. Let me know if it does not make sense.
decode_activation_memory_per_gpu should be equal to decode_activation_memory_per_layer * num_layers_per_gpu