Steboss / lovInstagram

A lovely Instagram scraper
4 stars 0 forks source link

URL_LOGIN no more exits #1

Open rushirg opened 5 years ago

rushirg commented 5 years ago

The URL_LOGIN(https://www.instagram.com/accounts/login/ajax/) no longer exits. if used, the session post method will return 400 Bad Request I believe the value of URL_LOGIN be https://www.instagram.com/accounts/login else every time you will get the following error SystemError: Login error: check your connection

Steboss commented 5 years ago

Hi @rushirg Thanks for posting this issue. You're right that Instagram has changed quite a lot since this code has been created. However, I can still login to instagram by using the link https://ww.instagram.com/accounts/login/ajax . Try this new login function and, please, tell me if it works:

        URL_HOME = "https://www.instagram.com/"
        URL_LOGIN = "https://www.instagram.com/accounts/login/ajax/"

       #create a new session
        session = requests.Session()
        user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
       #here input your username and password 
        data = {'username': username, 'password': password}
        #update the headers of the connection
        session.headers.update({
            'Accept-Encoding': 'gzip, deflate',
            'Accept-Language': 'en-US,en;q=0.8',
            'Connection': 'keep-alive',
            'Content-Length': '0',
            'Host': 'www.instagram.com',
            'Origin': 'https://www.instagram.com',
            'Referer': 'https://www.instagram.com',
            'User-Agent':user_agent,
            'X-Instagram-AJAX': '1',
            'X-Requested-With': 'XMLHttpRequest'
        })

        #retrieve the csrf token trying a connection to www.instagram.com
        print("Retrieve CSRF token")
        res = session.get(URL_HOME)
       #token csrf
        token = get_shared_data(res.text)["config"]["csrf_token"]
        session.headers.update({'X-CSRFToken': token})
        print("Login...")
       #now login with thorugh instagram.com/accounts/login/ajax
        login = session.post(URL_LOGIN, data, allow_redirects=True)
        token = next(c.value for c in login.cookies if c.name == 'csrftoken')
        session.headers.update({'X-CSRFToken': token})

        if not login.ok:
            print("Error during the login: check connection")
            sys.exit(-1)

        data = json.loads(login.text)

        if not data.get("authenticated",False):
            print("Error during the login: check your login data")
            sys.exit(-1)
        time.sleep(5*random.random())

Differently from the previous code we are first creating a connection to www.instagram.com and then we can login.

rushirg commented 5 years ago

@Steboss The URL_LOGIN(https://www.instagram.com/accounts/login/ajax/) looks weird. If I try to paste the same URL in the browser, I didn't see any username/password fields on the page. But, after trying with post method it returned me success(200).

About the above-modified login method now, I am getting AttributeError for BeautifulSoup method File "main_scraper.py", line 168, in get_shared_data soup = bs.BeautifulSoup(res.text, PARSER) AttributeError: 'unicode' object has no attribute 'text'

Also, The method call to retrieve_pages is missing in the above-modified login method.

Steboss commented 5 years ago

Hi @rushirg

You're right. You will not have any page if you simply copy and paste URL_LOGIN in your browser. Indeed, this is an ajax page, so you need to load some data to access to it. That's why it works when we use session.post after retrieval of csrf code.

Unfortunately, main_scraper.py is out of date. Instagram has changed quite a lot since last year. However, I have a bit of time in the next following weeks to dedicate to this project, so I will upload a new version of it. Do you need a specific use of this code?

Thanks

Stefano