Closed peggyxpxu closed 1 week ago
Yes, when setting the query length for q-former, it follows the configuration used in BLIP, which is set to 64. You can also customize the query length by modifying this line.
With this setup, you can refer to this code to set the parameter fix_length_audio=query_length (e.g., 64)
in the config, ensuring that the placeholder length matches the number of queries.
Yes, when setting the query length for q-former, it follows the configuration used in BLIP, which is set to 64. You can also customize the query length by modifying this line.
With this setup, you can refer to this code to set the parameter
fix_length_audio=query_length (e.g., 64)
in the config, ensuring that the placeholder length matches the number of queries.
Thanks, i use q-former with whisper encoder to train, but the loss of the model is much larger than i use linear with whisper encoder. Have you ever done similar experiments?
I carefully analyzed the test results and found that the illusion of the model was very serious when using q-former.
Yes, I have experimented with q-former, and I found that the train/validation loss is higher compared to using linear layers. This might be causing the model to exhibit hallucinations during training. However, I haven't done extensive hyperparameter tuning yet. You could try adjusting the query length and other parameters for further experimentation.
Hi sir: If i want to use q-former for projector in acc audiocaps, the length of the audio encoder Placeholder should set to 64?