NUS-HPC-AI-Lab / Neural-Network-Parameter-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
787 stars 38 forks source link

Data Preparation #11

Closed lequanlin closed 3 months ago

lequanlin commented 3 months ago

Hi,

Thank you for sharing the code of your work. I do like your idea and find it very inspiring.

For data preparation, it is mentioned in the paper that "we train a model from scratch and densely save checkpoints in the last epoch". But in the code, I notice that after fixing some parameters at an epoch in the middle of the training process, the rest of the parameters are collected from later epochs (not the last epoch).

May I ask what causes the inconsistency? Thank you very much.

1zeryu commented 3 months ago

Thanks for your feedback and sorry for my late reply. Considering the cost of training neural networks, we save checkpoints with the same initialization. In the paper, we say "densely save checkpoints in the last epoch" which means that we continue training the target layer in the last epoch to get more data. Sorry for it if this statement caused you to misunderstand. However, we train 200 models from scratch with randomly initialized weights in the entire model parameters generation experiment and report the results in our paper's Table 3. As shown in the table, p-diff can generate high-performance weights in both cases.

lequanlin commented 3 months ago

Thanks for clarification.