crawlbase / proxycrawl-ruby

ProxyCrawl API ruby gem for scraping and crawling
https://proxycrawl.com
MIT License
14 stars 0 forks source link

paging for the &scraper=amazon-serp #3

Closed stevendelarwelle closed 3 years ago

stevendelarwelle commented 3 years ago

How do i call the next page when using the amazon-serp?

crawlbase commented 3 years ago

@stevendelarwelle the best way to call the next page while using ProxyCrawl is to use Amazon’s pagination. To better understand how you can do it, please see the instructions below:

First, please note that Amazon has the following URL parameter to specify the page:

&page=2 or &page=3 etc..

To get to the next page, you can simply replace the page number per API request. You may refer to the sample URLs below:

Page 1: https://www.amazon.com/s?k=games Page 2: https://www.amazon.com/s?k=games&page=2 Page 3: https://www.amazon.com/s?k=games&page=3

Remember to fully encode the URL when sending your request. Here’s an example:

response = api.get('https://www.amazon.com/s?k=games&page=3')

If you have other questions or concerns, you may contact our technical support team here, and we will be more than willing to assist.

stevendelarwelle commented 3 years ago

so calling https://www.amazon.com/s?k=games&page=3?scraper=amazon-serp would bring page 3 of the results? And another question, are the results returned in the json the same order as they are on the webpage being scraped?

crawlbase commented 3 years ago

@stevendelarwelle yes, just make sure you use the parameters properly. One thing is the amazon URL, and the other are the parameters from the ProxyCrawl API.

So if you want to call directly without the ruby library, it would be something like this:

curl "https://api.proxycrawl.com/?token=YOUR_TOKEN&scraper=amazon-serp&url=https%3A%2F%2Fwww.amazon.com%2Fs%3Fk%3Dgames%26page%3D3"

And your second question, that is correct. The results have the same order in the json than in the website.

stevendelarwelle commented 3 years ago

how do i call the scraper=amazon-serp using the ruby library? I have this: when 'amazon' page = page.nil? ? "" : "?page=#{page}" url = "https://www.amazon.com/s?k=#{@keyword.word.encode}" + page + "&scraper=amazon-serp" else raise end

begin
  response = scraper_api.get(url)
  puts response.status_code
  puts response.original_status
  puts response.pc_status
  puts response.body
  case @cls_name
  when 'amazon'
    amazon(response)
  end
rescue => exception
  puts exception.backtrace
end
crawlbase commented 3 years ago

Passing the scraper param is similar as passing any other parameter to the ruby library like the example here.

Please find below a simple example on how to use the scraper=amazon-serp param or with the autoparse=true param:

options = { scraper: 'amazon-serp' }
# options = { autoparse: 'true' } # Or you can use this dynamic param instead of specifying the scraping name
response = api.get('https://www.amazon.com/s?k=games&page=3', options)
response_body = response.body
puts JSON.parse(response_body)['body’]