Open buaaliyi opened 7 years ago
Hi LI Ya,
First of all, thank you for writing me and being interested in the project.
I do still have some of the pretrained models, but each of them occupies quite a lot (from 500MB the sparse-based models to 3500MB the wide-and-deep ones). So, if you want, say, one specific model, I will try to figure out how to share it with you.
In any case, you should be able to train the models with the github code. The only missing parts for doing so are the visual embeddings (and the word embeddings if you want to train the Word2VisualVec model). It might be easier if I share those with you instead of the pretrained models.
You are right, many information is missing. But this is due to the fact that the git code refers to a still unpublished article (accepted though), sorry for that. The visual embeddings corresponds to the fc6 and fc7 layer extracted from an AlexNet [1] trained on ILSVRC12 [2] and Places [40] datasets. The method for pretraining the word embeddings in Word2VisualVec (which is not part of any of our models) is described in [4], and consists of training a skipgram model on the user tags associated to the 100M images in the YFCC100M dataset.
I hope to have helped you but, please, do not hesitate to contact me again if you need anything else. In the meanwhile, I will figure out how to share the visual embeddings with you.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. [3] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487–495, 2014. [4] J. Dong, X. Li, and C. G. M. Snoek. Word2VisualVec: Cross-Media Retrieval by Visual Feature Prediction. ArXiv e-prints, Apr. 2016.
Il 2017-02-08 11:13 LI Yi ha scritto:
Hello Alex,
I'm just interest in your great project, and hope to learn sth. from it. But I found it hard to do experiments by the code. So may I ask you for the pretrained models, thank you. On the other hand, Is there any examples or guideline for the feature extraction methods? I just want to know more details on both visual features and word embedding features.
Thanks.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or mute the thread [2].
*
Links:
[1] https://github.com/AlexMoreo/tensorflow-Text2Vis/issues/1 [2] https://github.com/notifications/unsubscribe-auth/ARdItA3UCIzOXIKUje9dLLCxCBxYtwAyks5raZU5gaJpZM4L6nj2
Thank you very much. I have got more information by your replies. These's still one question: what is the detailed data format in visual embeddings file (.npy) or word embeddings file ? For example, can you show us some data lines (as well as, the meaning of each data field) as a tiny demo?
Thank you!
Here it is, as promised! You can access the data through
sftp://cloudone.isti.cnr.it/FTP user: NeMISFTP pwd: FRmirGA06
Once inside, you can download the file Text2vis-experiments-NeuIR-SIGIR.rar. After uncompressing it, go to the visualembeddings/ dir, where you can find the fc6 and fc7 feature vectors. They are in .txt format, so the first time you load them might be quite slow, but the .npy version will be automatically generated for faster subsequent runs.
The meaning of each dimension though is not easily interpretable, as it corresponds to a convoluted signal which is found, through optimization, to be informative for classification in their respective image classification datasets. At most, we known them to be sparse (with around 80% of dimensions being zero or negative, i.e., 0 after a ReLU activation).
Hope it helps! Regards, Alex
Il 2017-02-09 02:21 LI Yi ha scritto:
Thank you very much. I have got more information by your replies. These's still one question: what is the detailed data format in visual embeddings file (.npy) or word embeddings file ? For example, can you show us some data lines (as well as, the meaning of each data field) as a tiny demo?
Thank you!
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [1], or mute the thread [2].
*
Links:
[1] https://github.com/AlexMoreo/tensorflow-Text2Vis/issues/1#issuecomment-278517535 [2] https://github.com/notifications/unsubscribe-auth/ARdItFq8k_F2vah0D-KR0z68z3mVK4y3ks5ramojgaJpZM4L6nj2
Great! It helps me much on understanding the article and project, I would like to take good use of them.
Thanks.
Hello Alex,
I'm just interest in your great project, and hope to learn something from it. But I found it hard to do experiments by the code. So may I ask you the pretrained models for better understanding, thank you. On the other hand, Is there any examples or guideline for the feature extraction methods? I just want to know more details on both visual features and word embedding features.
Thanks.