Xarrow / weibo-scraper

Simple Weibo Scraper
MIT License
93 stars 18 forks source link

Could not get pages larger than 2000 #19

Open iso-p opened 3 years ago

iso-p commented 3 years ago

The following code usually stops at i==1992 or i==1993. for i,tweet in enumerate(get_weibo_tweets_by_name(name=name, pages=page)): print(i)

shrutiphadke commented 3 years ago

I have the same problem. However, the example link on line 71 of weibo_scraper.py goes back as much as 6891 pages. Wonder what changed.

iso-p commented 3 years ago

A heads-up about this. By using cookie, this could be solved.

On Wed, Jun 2, 2021, 10:39 PM Shruti Phadke @.***> wrote:

I have the same problem. However, the example link on line 71 of weibo_scraper.py goes back as much as 6891 pages. Wonder what changed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Xarrow/weibo-scraper/issues/19#issuecomment-853537757, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANS2YMHX54RZXLDQNEESLA3TQ32Q7ANCNFSM4UIODEAA .

shrutiphadke commented 3 years ago

thank you, can you please elaborate a bit more about where I can use cookie in this code? I could not find the documentation

s8sun commented 2 years ago

Any updates? Did you figure it out? @shrutiphadke I'm also trying to craw weibo data and I get stuck too. Does any other crawler work better?

shrutiphadke commented 2 years ago

hey s8sun! What I figured is the following: This is scraping Weibo without logging in. And I think due to some recent changes in Weibo, a non-login view has a rate limit of 2K recent posts per account. I tried creating weibo account to get more but seems like you need a phone number from China to do that.

However, I was not able to figure out how to work with cookies as iso-p suggested above. Also, I am not sure how you would get cookies without logging in. Hope this helps.