appier / compatibility-family-learning

Compatibility Family Learning for Item Recommendation and Generation
https://arxiv.org/abs/1712.01262
20 stars 8 forks source link

Polyvore to Monomere like data in convert_polyvore.py #1

Open demonking2 opened 6 years ago

demonking2 commented 6 years ago

Can you please share the structure of data directories used in this script as polyvore website is down and I am having a hard time to understand how you generated negatives from the data and what is asin in polyvore data?

shaform commented 6 years ago

Hi demonking2,

The product id on polyvore site is used as ASINs.

Suppose Polyvore website is still alive, the crawler could be used to crawl the data. Specifically:

  1. polyvore_outfit_from_user and polyvore_outfit can be used to grab the urls of outfits as OUTFIT.jl.
  2. Using OUTFIT.jl as input, polyvore_outfit_set grabs information for each outfit as well as urls of the items in the outfits as OUTFIT_SET.jl.
  3. Manually extract the item urls from OUTFIT_SET.jl and use the urls as input to polyvore_item would grab all item images in a directory as well as item information ITEM.jl.
  4. Run python -m cfl.scripts.preprocess_polyvore --items-store ITEMS_S3_STORE_PATH --image-dir IMAGES_DIR --output-dir data/polyvore, where IMAGES_DIR is the image output of polyvore_item and ITEMS_S3_STORE_PATH stores the crawled items on S3. ITEMS_S3_STORE_PATH has the following structure: ITEMS_S3_STORE_PATH/polyvore_item/: stores ITEM.jl ITEMS_S3_STORE_PATH/polyvore_outfit_set/: stores OUTFIT_SET.jl
  5. preprocess_polyvore would produce the data structure: data/polyvore/images: stores processed images data/polyvore/meta.txt: stores the productid as well as categories of items. `data/polyvore/cate[n].txt: stores the product_ids for each category[n]. data/polyvore/outfits.txt`: stores the fav_count and items of each outfit.
  6. Run python -m cfl.keras.extract_v3 --input-dir data/polyvore/images --output-dir data/polyvore/latents would produce latent vectors for each item image.
  7. Run ./experiments/polyvore/convert_polyvore.sh, which produces the following data structure: parsed_data/polyvore_random/top_to_other/train/meta.txt: item ids as well as categories parsed_data/polyvore_random/top_to_other/train/pairs_pos.txt: positive pairs parsed_data/polyvore_random/top_to_other/train/pairs_neg.txt: negative pairs parsed_data/polyvore_random/top_to_other/train/pairs_all.txt: all pairs parsed_data/polyvore_random/top_to_other/train/features.b: features of items parsed_data/polyvore_random/top_to_other/train/source.txt: item ids from source categories parsed_data/polyvore_random/top_to_other/train/target.txt: item ids from target categories parsed_data/polyvore_random/top_to_other/val: the same structure as top_to_other/train parsed_data/polyvore_random/top_to_other/test: the same structure as top_to_other/train parsed_data/polyvore_random/bottom_to_other: the same structure as top_to_other parsed_data/polyvore_random/shoe_to_other: the same structure as top_to_other

Negatives are generated by random sampling items from different categories as described in the paper.