gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.4k stars 1.77k forks source link

Failed to login LinkedIn #76

Closed festum closed 6 years ago

festum commented 6 years ago

Hi,

Doesn't get jobs response seems login not successful. Not sure what I missed. Please share me the right way to do it. Thanks.

package main

import (
    "fmt"
    "log"
    "strings"

    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector()

    err := c.Post("https://www.linkedin.com/uas/login-submit", map[string]string{"session_key": "EMAIL", "session_password": "PASSWORD"})
    if err != nil {
        log.Fatal(err)
    }
    c.AllowedDomains = []string{"www.linkedin.com"}

    // attach callbacks after login
    c.OnResponse(func(r *colly.Response) {
        log.Println("response received", r.StatusCode)
    })

    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })

    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        fmt.Println("element:", e)
        if strings.Contains(e.Attr("href"), "/jobs/view") {
            fmt.Println("replaced:", strings.Replace(e.Attr("href"), "https://www.linkedin.com/", "", -1))
            e.Request.Visit(e.Attr("href"))
        }
    })

    // start scraping
    c.Visit("https://www.linkedin.com/jobs/")
}
asciimoo commented 6 years ago

Seems like linkedin login form requires other parameters too which can be found in their html code. e.g.: <input name="loginCsrfParam" id="loginCsrfParam-login" type="hidden" value="7ce13d50-07b1-4332-845e-ee65b967d730">

festum commented 6 years ago

Thank you @asciimoo! Found a nodejs plugin it sign in through login page. Is that anyway colly support visit login page get loginCsrfParam to fill in submit form like this?

asciimoo commented 6 years ago

@Festum you have to extract the token from the login page before you post login data as the stackoverflow example does. This can be done with an OnHTML callback which checks if the login form is available on the page.

nicolasassi commented 6 years ago

Sorry for commenting in a closed question. But I'm having the same issue and no clear way to solve the linkedin login problem seems to be given above. Could @Festum explain how you have done it or @asciimoo give an example? I would really appreciate! Thanks!

Awea commented 4 years ago

Hi @nicolasassi, You can't pass the login without JavaScript. I think you should try with another library like https://github.com/MontFerret/ferret or https://github.com/chromedp/chromedp (which will work in your usecase) or check #4

asciimoo commented 4 years ago

It can be done without javascript, but it probably requires a bit more time to reverse-engineer how it works. Of course, it can be easily done with a browser based solution, but it doesn't mean that you cannot do it with colly.

Awea commented 4 years ago

Hi @asciimoo! You're right about that. Thanks for your work on this library and have a nice day <3