In #215, the function get_theoretical_flops_per_token was created using a temporary workaround to ensure that the computation is conducted only in the presence of GPUs. This is because the underlying function get_total_number_of_trainable_parameters requires GPUs. However, in principle, get_theoretical_flops_per_token depends only on the model architecture.
The if statements could be removed if mocking of the get_total_number_of_trainable_parameters function was used for CPU tests.
In #215, the function
get_theoretical_flops_per_token
was created using a temporary workaround to ensure that the computation is conducted only in the presence of GPUs. This is because the underlying functionget_total_number_of_trainable_parameters
requires GPUs. However, in principle,get_theoretical_flops_per_token
depends only on the model architecture.The
if
statements could be removed if mocking of theget_total_number_of_trainable_parameters
function was used for CPU tests.