Open jurchiks opened 1 year ago
Original ticket was unclear to me so after discussing in Discord, writing this down for possible future revisiting.
So, every item's json has the following contents of "description_f" key:
"description_f": [
{
"title": "Kategorija:",
"value": " Portat\u012bvais dators"
},
{
"title": "Procesors(CPU):",
"value": " CPU"
},
{
"title": "Operat\u012bv\u0101 atmi\u0146a(RAM):",
"value": " 8GB"
},
{
"title": "Cietais disks:",
"value": " 256GB SSD"
},
{
"title": "Videokarte(GPU):",
"value": " Vega 3"
},
{
"title": "Oper\u0113t\u0101jsist\u0113ma(OS):",
"value": " Windows 10 Home"
}
]
The problem is - both order of properties, and values of "title" are inconsistent across inventory, so i'm using regex to pretty reliably look up RAM, SSD and GPU values.
re.search('(atmi|disk|hdd|ssd)', entry['title'], re.IGNORECASE)
Jurchiks suggests scraping the entire banknote store (why tho?) and allow merging columns clientside, e.g. "Cietais disks" and "HDD".
I'm not saying it's a bad idea but it's beyond my goal for this project, which is - to buy a cheap laptop.
I'm not saying it's a bad idea but it's beyond my goal for this project, which is - to buy a cheap laptop.
All I'm saying is - this project can be expanded beyond the current scope.
Jurchiks suggests scraping the entire banknote store (why tho?)
Uhhh, I was under the impression that you already do this? You don't need to scrape anything extra, what you can do is scrape the category list and then present the user with a choice of which category they want scraped; after that it's the same code.
I obviously did not mean to scrape ALL categories, this is per-category. You can't possibly think of putting all columns of all categories in one table.
this project can be expanded beyond the current scope
You have my blessing.
, I was under the impression that you already do this?
No, i only index and scrape laptop section, and it's already 1500+ items, which means 1500+ requests to build inventory data folder from scratch. I did it once on my workstation and copied over to server when deploying. Scraper code only refreshes item's data if price is different between index and cached file.
No, i only index and scrape laptop section
Ehh, miscommunication, I meant you already scrape ALL the data from the laptop section, not select data.
Make an intermediate step after index download where you collect all titles (trimmed first) into a list and make the user categorize them as they want via drag'n'drop + custom name, resulting in structure like this: { "title1": "CPU", "title2": "GPU", ... }
and when you render the final table, you remap the value of the original titles to custom columns.
This would make the thing work for any category in the shop, not just laptops.