RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.23k stars 1.03k forks source link

Reddit 403 Forbidden #3802

Open adegans opened 10 months ago

adegans commented 10 months ago

On and off my 5 reddit feeds fail in refresh. I've seen this 3 or 4 times now this week. It has worked fine for the past few months. But now, not all the time. Not sure what causes this. Nothing has changed (that I know of) on my end.

Some quota limit? Or fuckery from Reddit with API? Does anyone else have this? Thoughts?

For example;

HttpException: https://www.reddit.com/search.json?q=%20author%3ARevolutionaryYam85&sort=new&include_over_18=on resulted in 403 Forbidden in lib/contents.php line 106

index.php(11): RssBridge->main()
lib/RssBridge.php(113): DisplayAction->execute()
actions/DisplayAction.php(71): DisplayAction->createResponse()
actions/DisplayAction.php(106): RedditBridge->collectData()
bridges/RedditBridge.php(83): RedditBridge->collectDataInternal()
bridges/RedditBridge.php(142): getContents()
lib/contents.php(106)

Query string: action=display&context=user&u=RevolutionaryYam85&bridge=RedditBridge&format=Atom&d=new
Version: 2023-09-24
Os: Linux
PHP version: 8.1.25
dvikan commented 9 months ago

im seeing the same on rss-bridge.org. i think my server ip has been blocked see also threads on /r/rss

adegans commented 9 months ago

I did change the check interval to be every 12 hours, instead of the default 2 hours other feeds do. But since yesterday all feeds failed again.

Maybe we need a setting on RSS Bridge/FreshRSS to set the user-agent and some header settings so we can pretend to be browsers better. That way Reddit can't snuff out automated systems as easily for a blacklist.

virtadpt commented 9 months ago

Reddit's native /.rss feeds were messed up yesterday, too. It's not rss-bridge.

adegans commented 9 months ago

@virtadpt But I reported this a month ago... So whatever happened in the last few days, probably not as relevant ;) Before setting up the reddit bridge I found that Reddit killed off its rss feeds years ago, otherwise we wouldn't need the reddit bridge, right? Or did they re-add them?

Anyway, the link the bridge uses works fine in a browser, so the more logical thing is that Reddit has a quota for requests like this. Or has a way to profile these requests and block them. 403 errors are "NOT AUTHORIZED" errors after-all.

To work around that the user agent can be randomized or more 'browser like' headers can be used. Or even the load time/interval could be randomized, so it's not exactly every 2-6-12 (or whatever) hours.

But, rss bridge doesn't do that unfortunately.

virtadpt commented 9 months ago

@adegans I didn't realize that - my bad. :)

I didn't know Reddit got rid of them anywhere on their system. I've got a bunch of bots pulling RSS feeds for subreddits that've been running for the last couple of years, the only hiccoughs being the odd 5XX error. Check this out:

https://www.reddit.com/r/Cyberpunk/.rss

Pick a subreddit, put a /.rss at the end of the URL.

TBH I don't know why there is support for Reddit in RSS-bridge for that reason, unless there's a use case for proxying the existing ones that I'm not aware of.

The way things are going, now that I posted it somewhere maybe they really will kill off RSS. Time will tell.

Anyway, I agree with you that a randomized user-agent header would be a good thing. I tend to think it's a useful feature to have in general.

adegans commented 9 months ago

After your previous reply I did tinker with the real reddit rss feeds a bit - WHy use a bridge if there is real RSS, right? But it appears my server (FreshRSS) is blocked completely for now.

Also when loading the feeds locally in my rss reader (Netnewswire) it worked fine, but the formatting was all wonky with their RSS. Images not embedding, small thumbnails for some posts, larger ones for others. Weird borders around text... So the reddit bridge, for me at least, has the advantage of producing a nicer looking feed.

I also played with a new user agent for rss bridge, but that made no difference - Probably because my server is blocked by IP or something.

sigh and such...

virtadpt commented 9 months ago

Part of me just stumbled across this - it seems Reddit is being weird about RSS. Which suggests that it might be going away soon.

https://www.reddit.com/r/bugs/comments/18gv6yh/newsblur_not_getting_reddit_rss_feeds/

virtadpt commented 9 months ago

sigh Just like lighting a cigarette to make the bus arrive.

Tone866 commented 9 months ago

Looks like replacing www.reddit.com with old.reddit.com works for now: https://www.reddit.com/r/bugs/comments/18gv6yh/comment/kdkg3dn/?utm_source=share&utm_medium=web2x&context=3

dvikan commented 9 months ago

Can confirm using old.reddit.com works right now. Famous last words.

Fixed in https://github.com/RSS-Bridge/rss-bridge/pull/3848

dvikan commented 9 months ago

still working

Rjvs commented 6 months ago

Stopped working again, by the looks of it. Just tested trying to create a feed using bridge01 and several other hosts and got 403 on all of them. Seems to have started on Mar 28th using https://rssbridge.bus-hit.me/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=Json

dvikan commented 6 months ago

are you confortable giving the url you are getting 403 for?

Rjvs commented 6 months ago

Sorry, was editing comment to add details while you were asking; https://rssbridge.bus-hit.me/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=Json is the original feed that broke for me. I since tried creating replacement feeds on several instances, so I suspect it’s intentional.

dvikan commented 6 months ago

that url works fine in my dev pc (using 127.0.0.1)

but fails on https://rss-bridge.org/bridge01/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=html

the RedditBridge is programmed so that if reddit responds with 403 Forbidden, then rss-bridge caches that response for 60 minutes.

might seem excessive but it's an attempt to not get ip banned.

Rjvs commented 6 months ago

Thanks for looking into it straight away! I had tried it on several public instances but had not been exhaustive. Have now tested most and have found one that works, so confirm your result. However, the majority of the public instances are returning 403 for me, so might need even tighter rate limiting.

dvikan commented 6 months ago
curl 'https://old.reddit.com/search.json?q=subreddit%3Aphp&sort=hot&include_over_18=on'
<!doctype html>
     <html>
  <head>
    <title>Blocked</title>
    <style>
      body {
          font: small verdana, arial, helvetica, sans-serif;
          width: 600px;
          margin: 0 auto;
      }

      h1 {
          height: 40px;
          background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;
      }
    </style>
  </head>
  <body>
    <h1>whoa there, pardner!</h1>

<p>Your request has been blocked due to a network policy.</p>

<p>Try logging in or creating an account <a href=https://www.reddit.com/login/>here</a> to get back to browsing.</p>

<p>If you're running a script or application, please register or sign in with your developer credentials <a href=https://www.reddit.com/wiki/api/>here</a>. Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again. if you're supplying an alternate User-Agent string,
try changing back to default as that can sometimes result in a block.</p>

<p>You can read Reddit's Terms of Service <a href=https://www.reddit.com/wiki/api/>here</a>.</p>

<p>if you think that we've incorrectly blocked you or you would like to discuss
easier ways to get the data you want, please file a ticket <a href=https://support.reddithelp.com/hc/en-us/requests/new?ticket_form_id=21879292693140>here</a>.</p>

<p>when contacting us, please include your ip address which is: <strong>68.183.7.72</strong> and reddit account</p>
  </body>
</html>
corenting commented 5 months ago

It's possible to bypass the new limits by pretending to be the Android client: you have to login with the Android oauth client ID and add some headers. See https://github.com/redlib-org/redlib. You can then query the JSON endpoints on the oauth.reddit.com domain. I tried it for a project of mine and seems to works well for now.

Rjvs commented 2 months ago

Woohoo redlib-org/redlib#90 is merged now, @dvikan is the plan to use redlib within RSS-Bridge to add this feature back?