andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

RuntimeError: Command failed! ffmpeg -i "/tmp/ao_M0QAze.wav" -r 29.970000 -loglevel warning -safe 0 -f concat -i "/tmp/ao_cnpblR.txt" -pix_fmt yuv420p -vcodec h264 -strict -2 -y -acodec aac "../results/fg_cam_translator.mp4" #5

Closed xsingit closed 6 years ago

xsingit commented 6 years ago

Hello, thanks for the script. When I do the following command to visualize the locations of sound sources python sep_video.py ../data/translator.mp4 --model full --cam --out ../results/ I got a error:

Start time: 0.0 GPU = 0 Spectrogram samples: 128 2.145 2.135 100.0% complete, total time: 0:00:00. 0:00:00 per iteration. (01:57 PM Fri) Struct(alg=sourcesep, augment_audio=False, augment_ims=True, augment_rms=False, base_lr=0.0001, batch_size=6, bn_last=True, bn_scale=True, both_videos_in_batch=True, cam=False, check_iters=1000, crop_im_dim=224, dilate=False, do_shift=False, dset_seed=None, fix_frame=False, fps=29.97, frame_length_ms=64, frame_sample_delta=74, frame_step_ms=16, freq_len=1024, full_im_dim=256, full_model=False, full_samples_len=105000, gamma=0.1, gan_weight=0.0, grad_clip=10.0, im_split=False, im_type=jpeg, init_path=../results/nets/shift/net.tf-650000, init_type=shift, input_rms=0.141421356237, l1_weight=1.0, log_spec=True, loss_types=['fg-bg'], model_path=../results/nets/sep/full/net.tf-160000, mono=False, multi_shift=False, net_style=full, normalize_rms=True, num_dbs=None, num_samples=44144, opt_method=adam, pad_stft=False, phase_type=pred, phase_weight=0.01, pit_weight=0.0, predict_bg=True, print_iters=10, profile_iters=None, resdir=/multisensory-master/results/nets/sep/full, samp_sr=21000.0, sample_len=None, sampled_frames=63, samples_per_frame=700.700700701, show_iters=None, show_videos=False, slow_check_iters=10000, spec_len=128, spec_max=80.0, spec_min=-100.0, step_size=120000, subsample_frames=None, summary_iters=10, test_batch=10, test_list=../data/celeb-tf-v6-full/test/tf, total_frames=149, train_iters=160000, train_list=../data/celeb-tf-v6-full/train/tf, use_3d=True, use_sound=True, use_wav_gan=False, val_list=../data/celeb-tf-v6-full/val/tf, variable_frame_count=False, vid_dur=2.135, weightdecay=1e-05) ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -r 29.97 -vf scale=256:256 "/tmp/tmpVEitNC/small%04d.png" ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -r 29.97 -vf "scale=-2:'min(600,ih)'" "/tmp/tmpVEitNC/full_%04d.png" ffmpeg -loglevel error -ss 0.0 -i "../data/translator.mp4" -safe 0 -t 2.185 -ar 21000.0 -ac 2 "/tmp/tmpVEitNC/sound.wav" Running on: /gpu:0 2018-06-15 13:57:11.657961: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-06-15 13:57:12.523259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K40m major: 3 minor: 5 memoryClockRate(GHz): 0.745 pciBusID: 0000:02:00.0 totalMemory: 11.92GiB freeMemory: 11.84GiB 2018-06-15 13:57:12.523316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0, compute capability: 3.5) Raw spec length: [1, 128, 1025] Truncated spec length: [1, 128, 1025] bn scale: True arg_scope train = False sf/conv1_1 -> [1, 11036, 1, 64] sf/conv2_1_short -> [1, 690, 1, 128] sf/conv2_1_1 -> [1, 690, 1, 128] sf/conv2_1_2 -> [1, 690, 1, 128] sf/conv3_1_1 -> [1, 173, 1, 128] sf/conv3_1_2 -> [1, 173, 1, 128] sf/conv4_1_short -> [1, 44, 1, 256] sf/conv4_1_1 -> [1, 44, 1, 256] sf/conv4_1_2 -> [1, 44, 1, 256] im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3] pool -> [1, 32, 56, 56, 64] im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] pool -> [1, 16, 28, 28, 64] im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64] im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64] frac: 2.6875 sf/conv5_1 -> [1, 16, 1, 128] sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256] im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192] im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512] im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] time_stride = 1 im/conv5_1_short -> [1, 8, 7, 7, 512] before: [1, 8, 14, 14, 256] im/conv5_1_1 -> [1, 8, 7, 7, 512] before: [1, 8, 14, 14, 256] im/conv5_1_2 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512] im/conv5_2_1 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512] im/conv5_2_2 -> [1, 8, 7, 7, 512] before: [1, 8, 7, 7, 512] joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512] joint/logits -> [1, 8, 7, 7, 1] before: [1, 8, 7, 7, 512] gen/conv1 [1, 128, 1024, 2] -> [1, 128, 512, 64] gen/conv2 [1, 128, 512, 64] -> [1, 128, 256, 128] gen/conv3 [1, 128, 256, 128] -> [1, 64, 128, 256] Video net before merge: [1, 16, 1, 64] After: [1, 64, 1, 64] gen/conv4 [1, 64, 128, 320] -> [1, 32, 64, 512] Video net before merge: [1, 16, 1, 128] After: [1, 32, 1, 128] gen/conv5 [1, 32, 64, 640] -> [1, 16, 32, 512] Video net before merge: [1, 8, 1, 512] After: [1, 16, 1, 512] gen/conv6 [1, 16, 32, 1024] -> [1, 8, 16, 512] gen/conv7 [1, 8, 16, 512] -> [1, 4, 8, 512] gen/conv8 [1, 4, 8, 512] -> [1, 2, 4, 512] gen/conv9 [1, 2, 4, 512] -> [1, 1, 2, 512] gen/deconv1 [1, 1, 2, 512] -> [1, 2, 4, 512] gen/deconv2 [1, 2, 4, 1024] -> [1, 4, 8, 512] gen/deconv3 [1, 4, 8, 1024] -> [1, 8, 16, 512] gen/deconv4 [1, 8, 16, 1024] -> [1, 16, 32, 512] gen/deconv5 [1, 16, 32, 1536] -> [1, 32, 64, 512] gen/deconv6 [1, 32, 64, 1152] -> [1, 64, 128, 256] gen/deconv7 [1, 64, 128, 576] -> [1, 128, 256, 128] gen/deconv8 [1, 128, 256, 256] -> [1, 128, 512, 64] gen/fg [1, 128, 512, 128] -> [1, 128, 1024, 2] gen/bg [1, 128, 512, 128] -> [1, 128, 1024, 2] Restoring from: ../results/nets/sep/full/net.tf-160000 predict samples shape: (1, 44144, 2) samples pred shape: (1, 44144, 2) (128, 1025) Running on: 0 2018-06-15 13:57:18.753499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0, compute capability: 3.5) bn scale: False arg_scope train = True sf/conv1_1 -> [1, 11036, 1, 64] sf/conv2_1_short -> [1, 690, 1, 128] sf/conv2_1_1 -> [1, 690, 1, 128] sf/conv2_1_2 -> [1, 690, 1, 128] sf/conv3_1_1 -> [1, 173, 1, 128] sf/conv3_1_2 -> [1, 173, 1, 128] sf/conv4_1_short -> [1, 44, 1, 256] sf/conv4_1_1 -> [1, 44, 1, 256] sf/conv4_1_2 -> [1, 44, 1, 256] im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3] pool -> [1, 32, 56, 56, 64] im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] pool -> [1, 16, 28, 28, 64] im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64] im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64] frac: 2.6875 sf/conv5_1 -> [1, 16, 1, 128] sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256] im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192] im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512] im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] time_stride = 1 im/conv5_1_short -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256] im/conv5_1_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256] im/conv5_1_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] im/conv5_2_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] im/conv5_2_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512] joint/logits -> [1, 8, 14, 14, 1] before: [1, 8, 14, 14, 512] bn scale: False arg_scope train = True sf/conv1_1 -> [1, 11036, 1, 64] sf/conv2_1_short -> [1, 690, 1, 128] sf/conv2_1_1 -> [1, 690, 1, 128] sf/conv2_1_2 -> [1, 690, 1, 128] sf/conv3_1_1 -> [1, 173, 1, 128] sf/conv3_1_2 -> [1, 173, 1, 128] sf/conv4_1_short -> [1, 44, 1, 256] sf/conv4_1_1 -> [1, 44, 1, 256] sf/conv4_1_2 -> [1, 44, 1, 256] im/conv1 -> [1, 32, 112, 112, 64] before: [1, 63, 224, 224, 3] pool -> [1, 32, 56, 56, 64] im/conv2_1_1 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] im/conv2_1_2 -> [1, 32, 56, 56, 64] before: [1, 32, 56, 56, 64] pool -> [1, 16, 28, 28, 64] im/conv2_2_1 -> [1, 16, 28, 28, 64] before: [1, 32, 56, 56, 64] im/conv2_2_2 -> [1, 16, 28, 28, 64] before: [1, 16, 28, 28, 64] frac: 2.6875 sf/conv5_1 -> [1, 16, 1, 128] sf_net shape before merge: [1, 44, 1, 256], and after merge: [1, 16, 1, 256] im/merge1 -> [1, 16, 28, 28, 512] before: [1, 16, 28, 28, 192] im/merge2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 512] im/conv3_1_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_1_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_1 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv3_2_2 -> [1, 16, 28, 28, 128] before: [1, 16, 28, 28, 128] im/conv4_1_short -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_1 -> [1, 8, 14, 14, 256] before: [1, 16, 28, 28, 128] im/conv4_1_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_1 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] im/conv4_2_2 -> [1, 8, 14, 14, 256] before: [1, 8, 14, 14, 256] time_stride = 1 im/conv5_1_short -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256] im/conv5_1_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 256] im/conv5_1_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] im/conv5_2_1 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] im/conv5_2_2 -> [1, 8, 14, 14, 512] before: [1, 8, 14, 14, 512] joint/logits -> [1, 1, 1, 1, 1] before: [1, 1, 1, 1, 512] joint/logits -> [1, 8, 14, 14, 1] before: [1, 8, 14, 14, 512] Writing to: ../results/ ffmpeg -i "/tmp/ao_M0QAze.wav" -r 29.970000 -loglevel warning -safe 0 -f concat -i "/tmp/ao_cnpblR.txt" -pix_fmt yuv420p -vcodec h264 -strict -2 -y -acodec aac "../results/fg_cam_translator.mp4" Guessed Channel Layout for Input Stream #0.0 : mono [concat @ 0x382d700] DTS -230584300921369 < 0 out of order [h264_v4l2m2m @ 0x385f500] Could not find a valid device [h264_v4l2m2m @ 0x385f500] can't configure encoder Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height Traceback (most recent call last): File "sep_video.py", line 442, in ut.make_video(full_ims, pr.fps, pj(arg.out, 'fg%s.mp4' % name), snd(full_samples_fg)) File "/multisensory-master/src/aolib/util.py", line 3169, in make_video % (sound_flags_in, fps, input_file, sound_flags_out, flags, out_fname)) File "/multisensory-master/src/aolib/util.py", line 915, in sys_check fail('Command failed! %s' % cmd) File "/multisensory-master/src/aolib/util.py", line 12, in fail def fail(s = ''): raise RuntimeError(s) RuntimeError: Command failed! ffmpeg -i "/tmp/ao_M0QAze.wav" -r 29.970000 -loglevel warning -safe 0 -f concat -i "/tmp/ao_cnpblR.txt" -pix_fmt yuv420p -vcodec h264 -strict -2 -y -acodec aac "../results/fg_cam_translator.mp4"

I want to know what went wrong and what should i do... Any suggestion will be appreciated! Thanks.

andrewowens commented 6 years ago

Hi,

Sorry for the slow reply. I'm having trouble reproducing this error. Could it maybe be an issue with ffmpeg (e.g. maybe it's missing a codec that we're using)? Maybe you could try temporarily switching to a pre-built ffmpeg, such as this one: https://johnvansickle.com/ffmpeg/. Or you could modify the ffmpeg command in make_video (util.py, line 3169) to remove all of the command line flags related to codecs.

xsingit commented 6 years ago

Thank you very much for your reply. There are some issues with 'h264' encoder. I reinstall and configure ffmpeg. Working great now. Thanks.

ghost commented 5 years ago

Thank you very much for your reply. There are some issues with 'h264' encoder. I reinstall and configure ffmpeg. Working great now. Thanks.

hello,what is the version of ffmpeg you install to solve this problem?

ghost commented 5 years ago

hello can you see my question? can you give me a little help,I don;t konw how to go on after downloading the" ffmpeg-release-amd64-static.tar.xz - md5" what should I do next?

--