bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://search-engine-parser.readthedocs.io
441 stars 85 forks source link

Add Amazon #44

Open MeNsaaH opened 4 years ago

kaustavbhattacharya07 commented 4 years ago

Hello! I am interested in adding this enhancement. Is the requirement something related to extracting the top ten results for a particular search? Can you kindly tell me regarding the requirement?

MeNsaaH commented 4 years ago

Yeah. Given a search on Amazon, it should return the titles, description, links, price and ratings for the search. Check out the Contribution guide for more details on contribution

devajithvs commented 4 years ago

I tried adding amazon. They have a stringent policy against web scraping. Every request returns an HTML page given below:

To discuss automated access to Amazon data, please contact api-services-support@amazon.com. For information about migrating to our APIs, refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_c_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.

Enter the characters you see below Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies

MeNsaaH commented 4 years ago

Wow, that's some serious stuff. Maybe look into the headers that can be passed @devajithvs

devajithvs commented 4 years ago

Tried that too. Exactly copying the headers in the browser didn't work. I guess they have some other mechanism to prevent scraping.

MeNsaaH commented 4 years ago

Alright, I think we'll have to look into options of using selenium libraries tho. Meanwhile, I'll try out some additional headers and see where it gets us