andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

What are feats['im_0'] and feats['im_1'] of example for shift model? #30

Closed ruizewang closed 4 years ago

ruizewang commented 4 years ago

Hello, In read_example() of shift_dset.py, I saw

feats['im_0'] = tf.FixedLenFeature([], dtype=tf.string) feats['im_1'] = tf.FixedLenFeature([], dtype=tf.string) What are im_0 and im_1?

Thank you.

andrewowens commented 4 years ago

In the TFRecords, I concatenated the video frames together, so that N frames of the video are represented as one giant, tall (N*256 x 256 x 3) image. Due to image size limitations, I stored them as two separate images, im_0 and im_1. I suggest rewriting the I/O code for your application – there are definitely cleaner ways of doing this.

ruizewang commented 4 years ago

In the TFRecords, I concatenated the video frames together, so that N frames of the video are represented as one giant, tall (N*256 x 256 x 3) image. Due to image size limitations, I stored them as two separate images, im_0 and im_1. I suggest rewriting the I/O code for your application – there are definitely cleaner ways of doing this.

Wow, thanks for your reply, just like I guessed :)