JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.4k stars 703 forks source link

Scraping Twitter lists #849

Open nerra0pos opened 1 year ago

nerra0pos commented 1 year ago

Describe the feature

It seems there is no endpoint to use snscrape for Twitter lists AFAIK.

Example: https://twitter.com/i/lists/1636551680379744257

They are still publicly available (unlike search).

Is there a plan to add these to snscrape?

Would this fix a problem you're experiencing? If so, specify.

Yes

Did you consider other alternatives?

Yes, search, but now defunct.

Additional context

No response

nerra0pos commented 1 year ago

Oh, I just found TwitterListPostsScraper in the code. My bad. But does not work anymore either.

Raiders0786 commented 1 year ago

hey @terrapop , is snscrape working for you? I have been facing difficulties!

nerra0pos commented 1 year ago

Ah no, it seems the TwitterListPostsScraper also uses Twitter's search endpoint instead of scraping the actual list page:

6 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=list%3A1636551680379744257&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up. Errors: blocked (403), blocked (403), blocked (403), blocked (403), blocked (403), blocked (403)

nerra0pos commented 1 year ago

hey @terrapop , is snscrape working for you? I have been facing difficulties!

It did until yesterday.

Raiders0786 commented 1 year ago

yes for me too, but suddenly things are not working for me and i receive same message as you posted above.

I'm referring this : https://github.com/JustAnotherArchivist/snscrape/issues/846

nerra0pos commented 1 year ago

yes for me too, but suddenly things are not working for me and i receive same message as you posted above.

I'm referring this : #846

Yes, I know. Hoped to get around it with TwitterListPostsScraper, but uses the same (now blocked) search endpoint.

Raiders0786 commented 1 year ago

yes, I am thinking to move on from snscrape to nitter. ref: https://github.com/zedeus/nitter

Also, pls let me know if snscrape starts working for you, Thanks!

JustAnotherArchivist commented 1 year ago

IIRC, pagination on list pages is limited (similar to user profiles), so that's why it uses the search. If it doesn't work on the new search endpoint, I will probably implement a separate scraper that uses the list page.

nerra0pos commented 1 year ago

Yup, just tested it. List feeds stop after some 100-200 Tweets. Did not count, but just tested it in the browser.