Details about the visual features of the Baby dataset

MrShouxingMa commented 11 months ago

Thank you very much for sharing the multi-modal dataset!

When I looked at the Baby dataset in detail, I found that its visual features were a bit strange compared to its textual features. According to the article description, I found the EMNLP-IJCNLP 2019 article and wanted to download and view its visual data. It was a pity that the data no longer supports downloading.

Therefore, I would like to confirm with you the accuracy of its data. The visual features of the Baby dataset are as follows,

tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.5662, 1.2457],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 1.8834, 0.2391],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 4.7136],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 1.8662, 0.0000, 0.0000],
        [1.6190, 0.9108, 0.0000,  ..., 1.5103, 0.0000, 2.8353],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.4207, 4.5992]],
       device='cuda:0')

While the text features of the Baby dataset are as follows,

tensor([[-0.0765, -0.0047, -0.0418,  ..., -0.0222,  0.0120,  0.0375],
        [-0.0740, -0.0208, -0.0869,  ...,  0.0194,  0.0333,  0.0094],
        [-0.0228,  0.0234,  0.0099,  ...,  0.0420,  0.0256,  0.1033],
        ...,
        [ 0.0160, -0.0829,  0.0130,  ..., -0.0124,  0.0695,  0.0211],
        [-0.0774, -0.0378, -0.0818,  ..., -0.0073,  0.0119,  0.0790],
        [-0.0146, -0.0205,  0.0307,  ..., -0.0576,  0.0467,  0.0481]],
       device='cuda:0')

Best wishes

enoche commented 11 months ago

Hi, @MrShouxingMa The visual feature is used without modification in our model, you may download baby image feature at: http://snap.stanford.edu/data/amazon/productGraph/image_features/categoryFiles/image_features_Baby.b

Thanks for your feedback.

enoche commented 11 months ago

@MrShouxingMa By the way, all raw datasets are located at: http://jmcauley.ucsd.edu/data/amazon/links.html

However, the paper may automatically redirect to Version 2018.

To access the 2014 version of the dataset from the paper, please follow these steps:

Load the page: http://jmcauley.ucsd.edu/data/amazon/links.html
As soon as the page starts loading, press the esc key repeatedly until the page stops trying to redirect to the 2018 version.
Once the redirection is cancelled, you can navigate through the page to find and download the datasets for the 2014 version.

Please note that timing is crucial in this process to prevent the automatic redirection. If you're redirected to the 2018 version, simply go back and try again.

MrShouxingMa commented 11 months ago

Thank you very much for your prompt and patient reply!

With the guidance of your detailed tutorial, I found out by downloading the raw data that indeed there are many zero vectors in the raw data, which is similar to the data in your paper.

enoche / BM3

Details about the visual features of the Baby dataset #8