The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
341
stars
62
forks
source link
RuntimeError: Input and output sizes should be greater than 0, but got input (H: 0, W: 64) output (H: 1024, W: 64) #39
Closed
Mizuho32 closed 1 year ago
Hello, RetroCirce.
Thank you for your great work and I want to try inference single audio file.
Based on the ipynotebook of test_esc.zip https://github.com/RetroCirce/HTS-Audio-Transformer/issues/11#issuecomment-1189954944, I added little changes and created my notebook as in my gist (https://gist.github.com/Mizuho32/fba4105ab95fad1e64b9cf1421c21597). But I got an error as listed below. (For detail, please refer to the output of the 3rd cell of https://gist.github.com/Mizuho32/fba4105ab95fad1e64b9cf1421c21597)
Anything wrong? Please help me :pray: I used this audio https://www.youtube.com/watch?v=zzNdwF40ID8.
My torch versions are