andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

In which way the video frames combine #25

Closed tuffr5 closed 5 years ago

tuffr5 commented 5 years ago

Thanks for sharing your great work with us. But I have a question here, it is somewhat opaque in your code that I can not find the way you deal with the multiple frames. Is that you simply tile all frames together and then feed it into the "img_net"? waiting for your reply, thanks so much.

YiyuLuo commented 5 years ago

From 'sep_example.tf' given by the author, it can be seen that video frames are concatenated vertically and then stored in .tf files.

tuffr5 commented 5 years ago

Thanks for your gentle answer. But I have no idea where the sep_example.tf is? Can you please tell?

YiyuLuo commented 5 years ago

https://github.com/andrewowens/multisensory/issues/11#issuecomment-450376317 You can get it here.

tuffr5 commented 5 years ago

Thanks so much.

tuffr5 commented 5 years ago

Hi, do you know what are the labels like? Since it says in the paper that there is no human labeling, I wonder what is the label like.

YiyuLuo commented 5 years ago

You can look up in the code 'shift_net.py'.

tuffr5 commented 5 years ago

Ok, thank you. Actually, I was confused by the code, so I was asking for the answer. Code as following: "labels = tf.random_uniform([shape(ims, 0)], 0, 2, dtype=tf.int64, name='labels_sample') samples0 = tf.where(tf.equal(labels, 1), samples_ex[:, 1], samples_ex[:, 0]) samples1 = tf.where(tf.equal(labels, 0), samples_ex[:, 1], samples_ex[:, 0]) labels1 = 1 - labels

net0 = make_net(ims, samples0, pr, reuse=reuse, train=self.is_training) net1 = make_net(None, samples1, pr, im_net=net0.im_net, reuse=True, train=self.is_training) labels = tf.concat([labels, labels1], 0)"

  1. labels are generated by random, why is that?
  2. why there are two nets, net0, net1, do they have any relationship? I can't see it from the paper.

Thanks so much.

YiyuLuo commented 5 years ago

16 look at this

tuffr5 commented 5 years ago

Thank you. But It is still not clear, right?