AlanSwenson / amz-review-scraper

This project scrapes Amazon listings for recent reviews of specified products (ASIN) and stores them in a Postgres DB, example running below
https://peachtools.com
3 stars 1 forks source link

Is this optimal? #73

Closed AlanSwenson closed 5 years ago

AlanSwenson commented 5 years ago

https://github.com/AlanSwenson/amz-review-scraper/blob/03bd237c07726655ba7b4e68c5d20c95d9ef8f69/amz_review_scraper/amzscraper.py#L73-L88

@davidbegin

This is how I am currently storing reviews to the database, a for loop that checks if it already exists and if not adds each one, or updates it if it does exist. Is this the best way? Or should i be adding each one to the json object and then later reading each item from the json object into the database? Or does it matter?

My thoughts are that one way is making more calls to the database than the other.

The way its setup now is also what is keeping me from putting this method into the find_attributes method, which is the main reason I am even thinking about it at all.

davidbegin commented 5 years ago

Ohh yeah, this could definitely go a couple ways.

Right now, if it is running performance enough, then hell yeah it works and is easy to understand.

But I could see wanting to optimize more, and lot the options depend on the business needs.

I'm thinking maybe for a simple case of just add it, or don't. You could maybe use https://www.postgresql.org/docs/devel/sql-insert.html with ON CONFLICT option.

Also you might want to look into how ORMs handle this. Often they will have a find_or_create function, which you could use if you use said ORM. Or better for the knowledge see how they do it.

In general, a lot of refactors at this level are pushing the complexity into the SQL commands.

A common mistake junior programmers make is being afraid to jump down, and then when they do, they realize it's super easy because of course the whole language is created around interacting with adding and finding things in a Database! However, after you learn the knowledge, its often good to use someone else abstract version in a library because it will be optimized for you.

Tangent over!