deepsound-project / samplernn-pytorch

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
MIT License
288 stars 75 forks source link

Sizes of tensors mismatch in dimension 0 #19

Closed niuqun closed 6 years ago

niuqun commented 6 years ago

Dear all, I generate a set of small chunks with ffmpeg myself based on given configurations, such as the length (8s). However, I get the following error after I start training:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 353344 and 352320 in dimension 1 at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:3586

Anyone has any ideas?

Thanks.

niuqun commented 6 years ago

This is mainly due to the splitting error of long audio files. The following is the adjusted script to generate audio chunks of equal length (8s). I specifically remove the youtube-dl downloading and process a local audio file because I do not have access to YouTube.

!/bin/sh

if [ "$#" -ne 3 ]; then echo "Usage: $0 " exit fi

url=$1 chunk_size=$2 dataset_path=$3

downloaded="joint.wav"

rm -f $downloaded

format=$(youtube-dl -F $url | grep audio | sed -r 's|([0-9]+).*|\1|g' | tail -n 1)

youtube-dl $url -f $format -o $downloaded

converted="joint2.wav" ffmpeg -i $downloaded -ac 1 -ab 16k -ar 16000 $converted

mkdir $dataset_path length=$(ffprobe -i $converted -show_entries format=duration -v quiet -of csv="p=0") end=$(echo "$length / $chunk_size - 1" | bc) echo "splitting..." for i in $(seq 0 $end); do ffmpeg -hide_banner -loglevel error -ss $(($i * $chunk_size)) -t $chunk_size -i $converted "$dataset_path/$i.wav" done echo "done" rm -f $converted