Open zeydabadi opened 3 weeks ago
That means the number of codes that are never used in training data.
Thanks for your reply, but it's unclear to me what would be the implication of 8191 codes not being used. Can you please elaborate on that? is it good or bad? is there anything we can do about it?
This is really bad because most of the codes are never used. It might attribute to limited data size. There are two suggested ways: 1) increase the data size. 2) reduce the codebook size.
Thank you very much for your insights. Could you please clarify if there is any linear or other relationship between the duration of the pre-training data (measured in hours) and the size of the codebook? In your paper, you noted that approximately 2500 hours of data were used for pre-training. If I were to use around 250 hours of data for pre-training, what codebook size would you recommend?
Hi,
Thank you for sharing your code. During the vqnsp training I noticed this message "
Unused code in codebook: 8191
". Could you comment on what does this indicate?Thank you