Audio-AGI / AudioSep

Official implementation of "Separate Anything You Describe"
https://audio-agi.github.io/Separate-Anything-You-Describe/
MIT License
1.64k stars 118 forks source link

Unable to load music_speech_audioset model #6

Closed fabiogra closed 1 year ago

fabiogra commented 1 year ago

I tried using the Colab notebook. The first model checkpoint loads without any issue, however, the second model checkpoint leads to an error during the model initialization. Below is the snippet of the code that downloads the model checkpoints and attempts to initialize the model:

model = build_audiosep(
    config_yaml='config/audiosep_base.yaml',
    checkpoint_path=str(models[1][1]),
)

Upon executing the model initialization, a KeyError related to pytorch-lightning_version is encountered, as shown below:

KeyError: 'pytorch-lightning_version'

Additionally, a warning concerning the initialization of RobertaModel with some weights not being used is thrown, although it's unclear if this warning is related to the KeyError.

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

The issue seems to arise specifically with the second model checkpoint music_speech_audioset_epoch_15_esc_89.98.pt. I would appreciate any guidance or suggestions on how to resolve this KeyError and successfully load the second model checkpoint for further use.

Thank you.

liuxubo717 commented 1 year ago

Hi. I just did the colab test. It works for me. Here are the expected logs:

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight']

Load AudioSep model from [checkpoint/audiosep_base_4M_steps.ckpt]

Separate audio from [exp31_water drops_mixture.wav] with textual query [water drops]

Write separated audio to [separated_audio.wav]

The warning is NOT related to the KeyError. I didn't reproduce the error of "KeyError: 'pytorch-lightning_version', could you try it again?

fabiogra commented 1 year ago

I see from your log that you are using audiosep_base_4M_steps.ckpt. This works also for me, the issue is about the second model music_speech_audioset_epoch_15_esc_89.98.pt.