About training loss curve

Pepper-lll / LMforImageGeneration

Codebase for the paper-Elucidating the design space of language models for image generation

https://Pepper-lll.github.io/LMforImageGeneration/

29 stars 1 forks source link

About training loss curve #2

Open sen-ye opened 3 weeks ago

sen-ye commented 3 weeks ago

Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.

Pepper-lll commented 2 weeks ago

Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.

Okay, of course. I will release it in a few days.

Doraemonzzz commented 2 weeks ago

I tried GPT-L with a tokenizer of 1-16, and the loss curve is roughly as follows (about 38 epochs were run). Is this loss reasonable? @Pepper-lll

Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.

Pepper-lll commented 1 week ago

I tried GPT-L with a tokenizer of 1-16, and the loss curve is roughly as follows (about 38 epochs were run). Is this loss reasonable? @Pepper-lll

Hello, thanks for your nice work. I'm interested in the performance of different subcode numbers under the same model size. Could you release the training loss curve of different tokenizer(2-12, 3-8, 4-6) under the same model size? Thank you.

This is reasonable. We have discussed the training loss and the possible reason behind it in our paper, you may have a look :>

Doraemonzzz commented 1 week ago

Thank you for your response. I tested the FID (without CFG) of the model after training for 50 epochs, and it was approximately 83.5. Is this reasonable? (I'm not sure if my testing is correct because I noticed that the paper does not report results without CFG)."

Pepper-lll commented 1 week ago

Thank you for your response. I tested the FID (without CFG) of the model after training for 50 epochs, and it was approximately 83.5. Is this reasonable? (I'm not sure if my testing is correct because I noticed that the paper does not report results without CFG)."

We tested the 50k-FID without cfg at 400 epochs, it's around 13 for L model with tokenizer 2-10.

Doraemonzzz commented 1 week ago

Thank you for your response. It seems there are some issues on my side. If there are any new results, I will update them below this issue.

Doraemonzzz commented 1 week ago

I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.

delveintodetail commented 1 week ago

I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.

Thank you for your work.

Pepper-lll commented 1 week ago

I retested the fid of wo cfg, and the value is 15.58 (50 epoch). It seems to be a reasonable value.

Thanks for your information! I have released the FID result for L, XL, and XXL AR models with the most suitable tokenizers separately. I think your result is reasonable, although slightly different from mine.