Open sen-ye opened 3 weeks ago
Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.
Okay, of course. I will release it in a few days.
I tried GPT-L with a tokenizer of 1-16, and the loss curve is roughly as follows (about 38 epochs were run). Is this loss reasonable? @Pepper-lll
Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.
I tried GPT-L with a tokenizer of 1-16, and the loss curve is roughly as follows (about 38 epochs were run). Is this loss reasonable? @Pepper-lll
Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.
This is reasonable. We have discussed the training loss and the possible reason behind it in our paper, you may have a look :>
Thank you for your response. I tested the FID (without CFG) of the model after training for 50 epochs, and it was approximately 83.5. Is this reasonable? (I'm not sure if my testing is correct because I noticed that the paper does not report results without CFG)."
Thank you for your response. I tested the FID (without CFG) of the model after training for 50 epochs, and it was approximately 83.5. Is this reasonable? (I'm not sure if my testing is correct because I noticed that the paper does not report results without CFG)."
We tested the 50k-FID without cfg at 400 epochs, it's around 13 for L model with tokenizer 2-10.
Thank you for your response. It seems there are some issues on my side. If there are any new results, I will update them below this issue.
I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.
I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.
Thank you for your work.
I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.
Thanks for your information! I have released the FID result for L, XL, and XXL AR models with the most suitable tokenizers separately. I think your result is reasonable, although slightly different from mine.
Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.