Closed wesleyr36 closed 7 months ago
I can't reproduce your error. Please check which versions you have:
torch>=2.0.1
transformers==4.35.0
I was facing same problem yesterday when trying to run it on Colab. I haven't check carefully but the requirements were installed without errors so I guess it was ok.
Is it normal that in the class it tries to load:
UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-swin-large")
but it main it tries to load:
UperNetForSemanticSegmentation.from_pretrained("./results/")
If I make it point to "./results", it asks for a config.json files that is not present.
I was facing same problem yesterday when trying to run it on Colab. I haven't check carefully but the requirements were installed without errors so I guess it was ok.
Is it normal that in the class it tries to load:
UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-swin-large")
but it main it tries to load:
UperNetForSemanticSegmentation.from_pretrained("./results/")
If I make it point to "./results", it asks for a config.json files that is not present.
If you mean this line: https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/models/upernet_swin_transformers.py#L220
I used it only for debug purpose. It isn't used during inference run.
Ok, I was just wondering why one was pointing to huggingface and the other local files.
@jarredou were you able to run this model?
No, I had exact same error reported in that issue and I gave up. I'm currently training a mdx23c model with my colab account so I can't make further testing.
Ah yes! I've just remembered I did changes in transformers code. You need to change function at
site-packages\transformers\models\swin\modeling_swin.py
at line 312
def forward(self, pixel_values: Optional[torch.FloatTensor]) -> Tuple[torch.Tensor, Tuple[int]]:
_, num_channels, height, width = pixel_values.shape
if num_channels != self.num_channels:
# Hardcoded!
print('Old num_channels: {} New num_channels: {}'.format(self.num_channels, num_channels))
self.num_channels = num_channels
if 0:
raise ValueError(
"Make sure that the channel dimension of the pixel values match with the one set in the configuration."
)
# pad the input to be divisible by self.patch_size, if needed
pixel_values = self.maybe_pad(pixel_values, height, width)
embeddings = self.projection(pixel_values)
_, _, height, width = embeddings.shape
output_dimensions = (height, width)
embeddings = embeddings.flatten(2).transpose(1, 2)
return embeddings, output_dimensions
I didn't find workaround without change transformers code.
I was trying to test out the pre trained swin_upernet model you provided but uncountered the following:
I've made no changes to the configs and have tried updating my packages but no luck.