Open chenzhao-paypal opened 1 year ago
Hi, wondering are you using the Amazon Review Data (2018) version of All_Amazon_Meta.json? That version has 16 paragraphs for this example.
Hi @liyang2019, All_Amazon_Meta.json is no longer available in this link. So I manually combined the per-category metadata.
Hi, is it possible to contact the author of Amazon Review Data (2018) for downloading the original All_Amazon_Meta.json? It is not guaranteed the paragraphs or attribute evidence spans can be matched to other data sources.
Is it possible to share the dataset that you used? The Amazon review dataset 2018 has updated and no longer provide the All_Amazon_Meta.json.
Since the previous version of Amazon review dataset 2018 doesn't have a License, we can not share the data due to legal issue. Is it possible to contact the author for downloading the original All_Amazon_Meta.json for research purpose?
Hi, I have found some labels that are not correct. e.g. the following example, where there are only 10 paragraphs extracted, but the label contains "pid": 10, which results in index out of bound issue during generating tfrecords.
Extracted complete example: { "id": "B007T534MU", "category": "Messenger Bags", "paragraphs": [ { "text": "Will Leather Goods Wax Coated Canvas Shoulder Messenger Bag", "source": "title" }, { "text": "Wax coated canvas messenger the history of wax coated canvas walks in the footsteps of industry itself. Paraffin coated cotton arrived in the early 20th century, this waterresistant, breathable fabric has since been tested by sailors, farmers, firemen, soldiers, motorists and factory workers. We've paired our wax coated canvas with natural, vegetable tanned leather a marriage of classics that will patina beautifully over the life of your bag or tech case. Our wax coated canvas collection reflects the evolution of america's hustle and bustle a beautiful, functional reminder of our industrious past.", "source": "description" }, { "text": "Will Leather Goods is an American leather brand which believes that intuition is everything, hard work is honestly good and the best parts of the past deserve special attention. The Adler Family House, designer and producer of Will Leather Goods, began as Billy Belts in 1981 in Venice Beach, California, when founder Bill Adler chased a wild intuition. An actor, Bill needed a means of providing for his family during the Screen Actors Guild strike of '81. He set up a simple leather goods stand on the boardwalk at the beach. There, he worked hard for many years selling belts to Californians. Today the Adler Family House produces Will Leather Goods from Eugene, Oregon and sends it around the world with love.", "source": "description" }, { "text": "100% Polyester", "source": "feature" }, { "text": "Imported", "source": "feature" }, { "text": "Strap is adjustable from 11\" to 27\"", "source": "feature" }, { "text": "Travel pillow included", "source": "feature" }, { "text": "Product Dimensions: 17 x 16 x 6 inches", "source": "feature" }, { "text": "Shipping Weight: 2.4 pounds", "source": "feature" }, { "text": "Will Leather Goods", "source": "brand" } ], "attributes": [ { "key": "Material", "evidences": [ { "value": "Coated Canvas", "pid": 0, "begin": 23, "end": 36 }, { "value": "Canvas", "pid": 0, "begin": 30, "end": 36 }, { "value": "coated canvas", "pid": 1, "begin": 4, "end": 17 }, { "value": "canvas", "pid": 1, "begin": 11, "end": 17 }, { "value": "coated canvas", "pid": 1, "begin": 47, "end": 60 }, { "value": "canvas", "pid": 1, "begin": 54, "end": 60 }, { "value": "coated canvas", "pid": 1, "begin": 315, "end": 328 }, { "value": "canvas", "pid": 1, "begin": 322, "end": 328 }, { "value": "coated canvas", "pid": 1, "begin": 468, "end": 481 }, { "value": "canvas", "pid": 1, "begin": 475, "end": 481 }, { "value": "Coated Canvas", "pid": 10, "begin": 23, "end": 36 }, { "value": "Canvas", "pid": 10, "begin": 30, "end": 36 } ] } ] }
Original metadata: { "category": [ "Clothing, Shoes & Jewelry", "Luggage & Travel Gear", "Messenger Bags", "100% Polyester", "Imported", "Strap is adjustable from 11\" to 27\"", "Travel pillow included" ], "description": [ "Wax coated canvas messenger the history of wax coated canvas walks in the footsteps of industry itself. Paraffin coated cotton arrived in the early 20th century, this waterresistant, breathable fabric has since been tested by sailors, farmers, firemen, soldiers, motorists and factory workers. We've paired our wax coated canvas with natural, vegetable tanned leather a marriage of classics that will patina beautifully over the life of your bag or tech case. Our wax coated canvas collection reflects the evolution of america's hustle and bustle a beautiful, functional reminder of our industrious past.", "Will Leather Goods is an American leather brand which believes that intuition is everything, hard work is honestly good and the best parts of the past deserve special attention. The Adler Family House, designer and producer of Will Leather Goods, began as Billy Belts in 1981 in Venice Beach, California, when founder Bill Adler chased a wild intuition. An actor, Bill needed a means of providing for his family during the Screen Actors Guild strike of '81. He set up a simple leather goods stand on the boardwalk at the beach. There, he worked hard for many years selling belts to Californians. Today the Adler Family House produces Will Leather Goods from Eugene, Oregon and sends it around the world with love." ], "title": "Will Leather Goods Wax Coated Canvas Shoulder Messenger Bag", "brand": "Will Leather Goods", "feature": [ "100% Polyester", "Imported", "Strap is adjustable from 11\" to 27\"", "Travel pillow included", "Product Dimensions:\n \n17 x 16 x 6 inches", "Shipping Weight:\n \n2.4 pounds" ], "rank": "4,587,556inClothing,ShoesJewelry(", "date": "5 star", "asin": "B007T534MU", "imageURL": [ "https://images-na.ssl-images-amazon.com/images/I/31uUzm%2BHxFL._US40_.jpg", "https://images-na.ssl-images-amazon.com/images/I/41zmImo7s4L._US40_.jpg", "https://images-na.ssl-images-amazon.com/images/I/51kxIzgWU0L._US40_.jpg", "https://images-na.ssl-images-amazon.com/images/I/51hLyWv2O2L._US40_.jpg" ], "imageURLHighRes": [ "https://images-na.ssl-images-amazon.com/images/I/31uUzm%2BHxFL.jpg", "https://images-na.ssl-images-amazon.com/images/I/41zmImo7s4L.jpg", "https://images-na.ssl-images-amazon.com/images/I/51kxIzgWU0L.jpg", "https://images-na.ssl-images-amazon.com/images/I/51hLyWv2O2L.jpg" ] }
Label: { "id": "B007T534MU", "category": "Messenger Bags", "attributes": [ { "key": "Material", "evidences": [ { "value": "Coated Canvas", "pid": 0, "begin": 23, "end": 36 }, { "value": "Canvas", "pid": 0, "begin": 30, "end": 36 }, { "value": "coated canvas", "pid": 1, "begin": 4, "end": 17 }, { "value": "canvas", "pid": 1, "begin": 11, "end": 17 }, { "value": "coated canvas", "pid": 1, "begin": 47, "end": 60 }, { "value": "canvas", "pid": 1, "begin": 54, "end": 60 }, { "value": "coated canvas", "pid": 1, "begin": 315, "end": 328 }, { "value": "canvas", "pid": 1, "begin": 322, "end": 328 }, { "value": "coated canvas", "pid": 1, "begin": 468, "end": 481 }, { "value": "canvas", "pid": 1, "begin": 475, "end": 481 }, { "value": "Coated Canvas", "pid": 10, "begin": 23, "end": 36 }, { "value": "Canvas", "pid": 10, "begin": 30, "end": 36 }
}