RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
https://arxiv.org/abs/2202.00874
MIT License
341 stars 62 forks source link

RuntimeError: Input and output sizes should be greater than 0, but got input (H: 0, W: 64) output (H: 1024, W: 64) #39

Closed Mizuho32 closed 1 year ago

Mizuho32 commented 1 year ago

Hello, RetroCirce.

Thank you for your great work and I want to try inference single audio file.

Based on the ipynotebook of test_esc.zip https://github.com/RetroCirce/HTS-Audio-Transformer/issues/11#issuecomment-1189954944, I added little changes and created my notebook as in my gist (https://gist.github.com/Mizuho32/fba4105ab95fad1e64b9cf1421c21597). But I got an error as listed below. (For detail, please refer to the output of the 3rd cell of https://gist.github.com/Mizuho32/fba4105ab95fad1e64b9cf1421c21597)

RuntimeError: Input and output sizes should be greater than 0, but got input (H: 0, W: 64) output (H: 1024, W: 64)

Anything wrong? Please help me :pray: I used this audio https://www.youtube.com/watch?v=zzNdwF40ID8.

My torch versions are

torch==2.0.1
torchaudio==2.0.2
torchcontrib==0.0.2
torchlibrosa==0.0.9
torchmetrics==0.11.4
Mizuho32 commented 1 year ago

Just a simple problem... (duration should be less than 10sec)