google-research-datasets / MAVE

The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
Other
136 stars 22 forks source link

Product Images #5

Closed joshmyersdean closed 2 years ago

joshmyersdean commented 2 years ago

Hello!

Thank you for this work. I was wondering if you have a script to download the images for the products as well?

liyang2019 commented 2 years ago

We currently don't have a script to download the images for the products, since this work only focus on text features.

Wondering are you referring the images in the Amazon Review Dataset? I think the simplest way to download images is to modify clean_amazon_product_metadata_main.py to also download images from "imageURL" or "imageURLHighRes" from the product meta data. We have checked that the "asin" number is a unique identifier for texts features of the product meta data, but we haven't checked whether it is a unique identifier when also including image urls.

joshmyersdean commented 2 years ago

Thank you so much, I will give that shot!