chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
630 stars 107 forks source link

Generate a .csv from the scrape_posts() #109

Closed Tayzerdo closed 3 years ago

Tayzerdo commented 3 years ago

Hello,

First of all, amazing job with the updates with the instascraper package, the new updates are so awesome

I have a question

I'm using this code below to retrieve the posts information (I based my self on the Joe biden scrape)

from selenium.webdriver import Chrome
from instascrape import Profile, scrape_posts
import pandas as pd
import json

# Creating our webdriver
webdriver = Chrome("chromedriver.exe")

# Scraping Joe Biden's profile
SESSIONID = '....'
headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
           "cookie": f"sessionid={SESSIONID};"}
Insta = Profile("clicktays")
Insta.scrape(headers=headers)

# Scraping the posts
posts = Insta.get_posts(webdriver=webdriver, amount=2, login_first=True)
scraped, unscraped = scrape_posts(posts, silent=False, headers=headers, pause=10)

I saw this code to retrieve the information regarding the post, but I'm having some difficult to try to save the post information into a csv or excel file, or even inside pandas

I had a look around but I couldn't be able to do it as I don't have an expert level of python

can you give me a hand?

Xerrion commented 3 years ago

You could just do the following.

posts = Insta.get_posts(
    webdriver=webdriver,
    amount=2,
    login_first=True
    scrape=True,
    scrape_pause=10,
)

for post in posts:
    post.to_csv(f"{post['shortcode']}.csv")

But if you are only fetching 2 posts from one account, you could set scrape_pause=0, to just let it do it. I normally use 3 seconds and can fetch 1390 posts just fine without being rate limited.

Tayzerdo commented 3 years ago

Hey hey Xerrion, thank you so much for the answer.

The amount 2 was just a test to retrieve the data, forgot to take it out when posting here.

So with your code, am I be able to extract the informaiton regarding each post that we take it from the profile?

Xerrion commented 3 years ago

Should work by creating a csv file for each post with the scraped content.

Tayzerdo commented 3 years ago

Perfect, thanks a lot!!! Just trying here but receiving an error

InstagramLoginRedirectError: Instagram is redirecting you to the login page instead of the page you are trying to scrape. This could be occuring because you made too many requests too quickly or are not logged into Instagram on your machine. Try passing a valid session ID to the scrape method as a cookie to bypass the login requirement

So I'll wait a bit to try it again

Xerrion commented 3 years ago

Try my solution from issue https://github.com/chris-greening/instascrape/issues/102#issuecomment-794121288, you should then be able to remove the login=True param.

Tayzerdo commented 3 years ago

Perfect, thanks a lot

Tayzerdo commented 3 years ago

Now it is working, I just added a try/exception and is working til the moment

# Scraping the posts
try:
    posts = Insta.get_posts(webdriver=webdriver, amount=100, login_first=True, scrape=True,scrape_pause=3)
except:
    print("Error with instagram login")

Thanks a lot for the help @Xerrion