MinuraSilva / RunningScraperAPI

REST API for getting data from Elasticsearch database
0 stars 0 forks source link

Fix gender filtering for shoes #2

Open MinuraSilva opened 4 years ago

MinuraSilva commented 4 years ago

Currently cannot filter by gender correctly since adidas.ca is not consistent in specifying gender on the item page for shoes. This may lead to a scraped item not having a specified gender which makes it impossible to sort this item by gender.

Since I added both original gender size and opposite gender size to the same field (availability_page.available_sizes), it is also not possible to sort by gender by searching for the gender tag.

The best way to fix this problem is to determine which gender a shoe is by comparing shoe size to the sku (this is more complicate than expected because there are sometimes children's sizes (starting with K; e.g. K12) in the same shoe that is available for adults. Then add the size with gender prefix (e.g. W6, M9) to a field called "availability_page.available_sizes". Then convert these sizes to the opposite gender and add to a field called "availability_page.alternate_available_sizes".

Be sure to make the gender determiner robust to things like children's shoes.

MinuraSilva commented 4 years ago

A potential solution is to get the gender in two locations:

  1. The gender is in the starting URL passed into scrapy
  2. Deduce the gender from the relationship between the sku code and the shoe size.
  3. Look out for childrens sizes and unisex shoes (from the sku codes)

If all three ways above are consistent, set gender. Otherwise decide on a best guess.