Closed tuffr5 closed 5 years ago
From 'sep_example.tf' given by the author, it can be seen that video frames are concatenated vertically and then stored in .tf files.
Thanks for your gentle answer. But I have no idea where the sep_example.tf is? Can you please tell?
https://github.com/andrewowens/multisensory/issues/11#issuecomment-450376317 You can get it here.
Thanks so much.
Hi, do you know what are the labels like? Since it says in the paper that there is no human labeling, I wonder what is the label like.
You can look up in the code 'shift_net.py'.
Ok, thank you. Actually, I was confused by the code, so I was asking for the answer. Code as following: "labels = tf.random_uniform([shape(ims, 0)], 0, 2, dtype=tf.int64, name='labels_sample') samples0 = tf.where(tf.equal(labels, 1), samples_ex[:, 1], samples_ex[:, 0]) samples1 = tf.where(tf.equal(labels, 0), samples_ex[:, 1], samples_ex[:, 0]) labels1 = 1 - labels
net0 = make_net(ims, samples0, pr, reuse=reuse, train=self.is_training) net1 = make_net(None, samples1, pr, im_net=net0.im_net, reuse=True, train=self.is_training) labels = tf.concat([labels, labels1], 0)"
Thanks so much.
Thank you. But It is still not clear, right?
Thanks for sharing your great work with us. But I have a question here, it is somewhat opaque in your code that I can not find the way you deal with the multiple frames. Is that you simply tile all frames together and then feed it into the "img_net"? waiting for your reply, thanks so much.