I think there might be a need to modify the below line [link].
print(f"model parameter {sum(p.numel() for p in model.parameters()) / 1024 ** 3:.2f}B")
Instead of dividing the number of parameters by 1024 ** 3 to calculate the parameters in billions, it might be more accurate to use 1000 ** 3. Specifically, when I checked the number of parameters of the following models, the results are:
@bokyeong1015 Thanks for attention. You are correct. Due to an oversight in my coding, there was an error in our calculation method. I have now updated the code to address this issue.
Hi, thank you for sharing your impressive work.
I think there might be a need to modify the below line [link].
print(f"model parameter {sum(p.numel() for p in model.parameters()) / 1024 ** 3:.2f}B")
Instead of dividing the number of parameters by
1024 ** 3
to calculate the parameters in billions, it might be more accurate to use1000 ** 3
. Specifically, when I checked the number of parameters of the following models, the results are:I would appreciate it if you could share your opinion. Thank you for your time and consideration.