Tsinghua-MARS-Lab / StateTransformer

137 stars 7 forks source link

Question about diffusion decoder training #140

Closed weiaiF closed 7 months ago

weiaiF commented 9 months ago

I have a question about the mentioned two-stage training process. What is the final model performance if you train the diffusion decoder and transformer backbone together

JingzheShi commented 9 months ago

Hi, thanks for the question! In our work we stroke a balance between performance and simplicity: for better convergence of the backbone we trained the diffusion decoder and the backbone separately, and for simplicity we did not make it a three-stage training process (in other words, we did not add a final fine-tuning stage which trains the diffusion decoder and transformer backbone together). However, we have been conducting experiments (these experiments include two cases that may help answer the question in detail: 1) to train diffusion decoder and transformer backbone together from scratch, and 2) to add a fine-tuning stage after the two-stage training process). The results will be available in a week and we will add a further comment.

larksq commented 7 months ago

We have uploaded the checkpoint for STR(CKS)-16M and added the performance to the new performance section. Closing this issue. Reopen if you have more questions.