gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.38k stars 1.77k forks source link

Amazon Captcha Catches My Scraper #628

Open melissa9090 opened 3 years ago

melissa9090 commented 3 years ago

I did make Scraping for Amazon Product Titles but Amazon captcha catches my scraper. I tried 10 times with go run main.go(8 times catches me - 2 times I scrapped the product title)

I researched this but I did not find any solution for golang(there is just python) is there any solution for me?

package main

import (
    "fmt"
    "strings"0

    "github.com/gocolly/colly"
)

func main() {

    // Create a Collector specifically for Shopify
    c := colly.NewCollector(
        colly.AllowedDomains("www.amazon.com", "amazon.com"),
    )
    c.OnHTML("div", func(h *colly.HTMLElement) {
        capctha := h.Text
        title := h.ChildText("span#productTitle")
        fmt.Println(strings.TrimSpace(title))
        fmt.Println(strings.TrimSpace(capctha))
    })

    // Start the collector
    c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}

_Output:

Enter the characters you see below Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies._

russian-developer commented 3 years ago

Try to setup headers.

AmitPress commented 3 years ago

Have you tried Headers? If yes, then whats the result now? is it still 8-2?