Open demonking2 opened 6 years ago
Hi demonking2,
The product id on polyvore site is used as ASINs.
Suppose Polyvore website is still alive, the crawler could be used to crawl the data. Specifically:
polyvore_outfit_from_user
and polyvore_outfit
can be used to grab the urls of outfits as OUTFIT.jl
. OUTFIT.jl
as input, polyvore_outfit_set
grabs information for each outfit as well as urls of the items in the outfits as OUTFIT_SET.jl
.OUTFIT_SET.jl
and use the urls as input to polyvore_item
would grab all item images in a directory as well as item information ITEM.jl
.python -m cfl.scripts.preprocess_polyvore --items-store ITEMS_S3_STORE_PATH --image-dir IMAGES_DIR --output-dir data/polyvore
, where IMAGES_DIR
is the image output of polyvore_item
and ITEMS_S3_STORE_PATH
stores the crawled items on S3.
ITEMS_S3_STORE_PATH
has the following structure:
ITEMS_S3_STORE_PATH/polyvore_item/
: stores ITEM.jl
ITEMS_S3_STORE_PATH/polyvore_outfit_set/
: stores OUTFIT_SET.jl
preprocess_polyvore
would produce the data structure:
data/polyvore/images
: stores processed images
data/polyvore/meta.txt
: stores the productid as well as categories of items.
`data/polyvore/cate[n].txt: stores the product_ids for each category
[n].
data/polyvore/outfits.txt`: stores the fav_count and items of each outfit.python -m cfl.keras.extract_v3 --input-dir data/polyvore/images --output-dir data/polyvore/latents
would produce latent vectors for each item image../experiments/polyvore/convert_polyvore.sh
, which produces the following data structure:
parsed_data/polyvore_random/top_to_other/train/meta.txt
: item ids as well as categories
parsed_data/polyvore_random/top_to_other/train/pairs_pos.txt
: positive pairs
parsed_data/polyvore_random/top_to_other/train/pairs_neg.txt
: negative pairs
parsed_data/polyvore_random/top_to_other/train/pairs_all.txt
: all pairs
parsed_data/polyvore_random/top_to_other/train/features.b
: features of items
parsed_data/polyvore_random/top_to_other/train/source.txt
: item ids from source categories
parsed_data/polyvore_random/top_to_other/train/target.txt
: item ids from target categories
parsed_data/polyvore_random/top_to_other/val
: the same structure as top_to_other/train
parsed_data/polyvore_random/top_to_other/test
: the same structure as top_to_other/train
parsed_data/polyvore_random/bottom_to_other
: the same structure as top_to_other
parsed_data/polyvore_random/shoe_to_other
: the same structure as top_to_other
Negatives are generated by random sampling items from different categories as described in the paper.
Can you please share the structure of data directories used in this script as polyvore website is down and I am having a hard time to understand how you generated negatives from the data and what is asin in polyvore data?