inference with swin_upernet fails

hunterhogan commented 3 months ago

py -m inference.py --model swin_upernet \
  --config_path config_vocals_swin_upernet.yaml \
  --start_check_point /models/MSST/model_swin_upernet_ep_56_sdr_10.6703.ckpt ...

config downloaded from releases.

Failed to import transformers.models.upernet.modeling_upernet because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'

split_torch_state_dict_into_shards is included starting from huggingface-hub version 0.23.0. tokenizers 0.14.0 and 0.14.1 requires huggingface-hub<18.0 and transformers==4.35.0 requires tokenizers=0.14.*

Installed by changing requirements.txt:

huggingface-hub>=0.23
transformers~=4.35.0

But:

ERROR: Could not find a version that satisfies the requirement pedalboard==0.8.1 (from versions: 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8, 0.8.9, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.9.8, 0.9.9, 0.9.10, 0.9.11, 0.9.12)
ERROR: No matching distribution found for pedalboard==0.8.1

So: pedalboard~=0.8.1

Then:

--> 110 with torch.amp.autocast(enabled=config.training.use_amp):
    111     with torch.inference_mode():
    112         if config.training.target_instrument is not None:

File c:\apps\MSST\Lib\site-packages\ml_collections\config_dict\config_dict.py:829, in ConfigDict.__getattr__(self, attribute)
    827   return self[attribute]
    828 except KeyError as e:
--> 829   raise AttributeError(e)

AttributeError: "'use_amp'"

So, a few errors later, utils.py:

- with torch.cuda.amp.autocast(enabled=config.training.use_amp):
+ with torch.amp.autocast(device):

Then: ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration.

So, I looked at: if len(batch_data) >= batch_size or (i >= mix.shape[1]): arr = torch.stack(batch_data, dim=0) arr.shape = torch.Size([1, 2, 261632])

I was confused, so mix.shape = torch.Size([2, 3038448]). I didn't think it would work, but I tried concatenate: arr = torch.cat(batch_data, dim=0) arr.shape = torch.Size([2, 261632])

The value error went away, but:

    [177](file:///C:/apps/MSST/models/upernet_swin_transformers.py:177) def cac2cws(self, x):
    [178](file:///C:/apps/MSST/models/upernet_swin_transformers.py:178)     k = self.num_subbands
--> [179](file:///C:/apps/MSST/models/upernet_swin_transformers.py:179)     b, c, f, t = x.shape
    [180](file:///C:/apps/MSST/models/upernet_swin_transformers.py:180)     x = x.reshape(b, c, k, f // k, t)
    [181](file:///C:/apps/MSST/models/upernet_swin_transformers.py:181)     x = x.reshape(b, c * k, f // k, t)

ValueError: not enough values to unpack (expected 4, got 3)

I'm out of ideas now.

ZFTurbo commented 3 months ago

I've just checked and inference.py works normally with swin_uppernet. I added use_amp in config to avoid errors.

args = [
            '--model_type', 'swin_upernet',
            "--config_path", code_path + "configs/config_vocals_swin_upernet.yaml",
            "--start_check_point", code_path + "results/model_swin_upernet_ep_56_sdr_10.6703.ckpt",
            "--store_dir", code_path + "results_tracks/",
            "--input_folder", 'H:/',
            "--device_ids", "0",
            "--extract_instrumental",
        ]

ZFTurbo commented 3 months ago

Shape: (1, 2, 261632) seems ok because first dimension is batch size, second channels and third waveform.

jarredou commented 3 months ago

@hunterhogan Check also this small edit for swin_upernet to fix the "ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration." error message: https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/6#issuecomment-1837159280

ZFTurbo commented 3 months ago

@hunterhogan Check also this small edit for swin_upernet to fix the "ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration." error message: #6 (comment)

Thanks, I absolutely forgot about this 😭

hunterhogan commented 3 months ago

@hunterhogan Check also this small edit for swin_upernet to fix the "ValueError: Make sure that the channel dimension of the pixel values match with the one set in the configuration." error message: #6 (comment)

Thanks, I absolutely forgot about this 😭

Editing the transformers package fixed it. I would never have thought of that!

I made a pull request for the requirements.txt.

ZFTurbo / Music-Source-Separation-Training

inference with swin_upernet fails #51