JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.42k stars 706 forks source link

Age-restricted tweets #419

Open Slider-Whistle opened 2 years ago

Slider-Whistle commented 2 years ago

Example (NSFW): https://twitter.com/TabletKnight/status/1499083203881099264 (NSFW) EDIT: Viewing the tweet without an account will result in this message, which has been preventing SNScrape from recording any data:

Age-restricted adult content. This content might not be appropriate for people under 18 years old. To view this media, you’ll need to [log in](https://twitter.com/) to Twitter. [Learn more](https://help.twitter.com/rules-and-policies/notices-on-twitter)

You can view the tweet as long as you're signed into an account (even a suspended one, meaning you can view them with one of those accounts that get instantly banned for not having a phone number). Going by that, I think that SNScrape should still be able to download NSFW posts made in the future without having to resort to private/requested/developer API access, but possibly not without using the sign-in cookies of an existing user. Supporting this would also allow for the scraping of users marked as private (as long as your own account can see it), so double win I think.

It could also be possible to put a mechanism in place for automatically creating new accounts and saving those details just for scraping, but I think that kind of abuse of the service wouldn't be overlooked by Twitter.

JustAnotherArchivist commented 2 years ago

I don't know when this was introduced exactly, but it has been happening for at least several days now (I first saw it on the 26th). If anyone knows of a workaround to access this content, I'm all ears. But snscrape does not and will not support anything that requires authentication: #270

Slider-Whistle commented 2 years ago

But snscrape does not and will not support anything that requires authentication: #270

That's a shame. I don't think that any workaround's going to come up.

JustAnotherArchivist commented 2 years ago

Yeah, it's unlikely. Until such time, this is impossible to fix in snscrape.

This was introduced into Twitter's 'Sensitive media policy' in early January, by the way: https://web.archive.org/web/diff/20220105171207/20220106023308/https://help.twitter.com/en/rules-and-policies/media-policy See also: https://help.twitter.com/en/rules-and-policies/notices-on-twitter

Slider-Whistle commented 2 years ago

But snscrape does not and will not support anything that requires authentication: #270

That's a shame. I don't think that any workaround's going to come up.

Me of little faith: https://github.com/mikf/gallery-dl/issues/2354#issuecomment-1058882747 Still would take a bit of refactoring it seems, or possibly a new plugin altogether, so I understand if it's still a WONTFIX.

JustAnotherArchivist commented 2 years ago

Lovely. I haven't thought yet about how that would best be integrated, but at least it's an option. My initial thought would be a separate twitter-tweet-like scraper that just fetches an individual tweet from that endpoint.

Quoting that comment here just in case that issue vanishes:

The syndication API doesn't have the login gate, for now: https://cdn.syndication.twimg.com/tweet?id=<tweet id> https://syndication.twitter.com/timeline/profile?suppress_response_codes=true&screen_name=<user name>&with_replies=false&with_retweets=false

TheTechRobo commented 2 years ago

Can confirm that syndication URL seems to still work.