Closed lequanlin closed 3 months ago
Thanks for your feedback and sorry for my late reply. Considering the cost of training neural networks, we save checkpoints with the same initialization. In the paper, we say "densely save checkpoints in the last epoch" which means that we continue training the target layer in the last epoch to get more data. Sorry for it if this statement caused you to misunderstand. However, we train 200 models from scratch with randomly initialized weights in the entire model parameters generation experiment and report the results in our paper's Table 3. As shown in the table, p-diff can generate high-performance weights in both cases.
Thanks for clarification.
Hi,
Thank you for sharing the code of your work. I do like your idea and find it very inspiring.
For data preparation, it is mentioned in the paper that "we train a model from scratch and densely save checkpoints in the last epoch". But in the code, I notice that after fixing some parameters at an epoch in the middle of the training process, the rest of the parameters are collected from later epochs (not the last epoch).
May I ask what causes the inconsistency? Thank you very much.