RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.18k stars 1.02k forks source link

Problems with Facebook on public RSS-Bridge instances #2047

Open em92 opened 3 years ago

em92 commented 3 years ago

Due to many recent You must be logged in to view this page. This is not supported by RSS-Bridge issues coming from Facebook users (https://github.com/RSS-Bridge/rss-bridge/issues/2041, comments from https://github.com/RSS-Bridge/rss-bridge/issues/2014, https://github.com/RSS-Bridge/rss-bridge/issues/2037) I investigated those issues more clearly.

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned. If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Looks like FacebookBridge has the same problems as InstagramBridge (https://github.com/RSS-Bridge/rss-bridge/issues/1891), which breaks using FacebookBridge on public RSS-Bridge instances.

Possible solutions for users (same as in metioned InstagramBridge):

grivanov commented 3 years ago

Thank you very much for investigating. I'm using shared hosting for mine and it's folder protected so only I have access, but it probably just checks the IP and since it's shared hosting, it's heavily used. May have to pay for a private IP in that case.

mrtpcet commented 3 years ago

I installed it on my vps (Infomaniak, Switzerland) and I am the only one using it. Unfortunately it doesn't work either. I tried to visit a Facebook page with Firefox and it automatically redirects me to the login page.

arcctgx commented 3 years ago

I'm running my single-user RSS Bridge instance on Digital Ocean, and feeds which were giving me error 500 since April 1st just started working again. Let's see for how long...

Edit: stopped working two hours later.

Noutladeesse commented 3 years ago

I am having exactly this problem for 1 week. Over 100 feeds created through Facebook main site bridge result in errors. I am using my personal laptop, so this cannot be the reason. Yesterday, 3 feeds (out of the 100+) delivered lots of previous missed articles ; today, only 1 out of 100+ is working. Looks like it is random and erratic. I am using them to deliver a daily news digest, it's been 1 week I cannot do it properly and need to check all sources 1 by 1. It is not efficient and time-consuming. What should I do?

tstanbur commented 3 years ago

If I open "https://www.facebook.com/facebook/posts" from my home laptop, everything is fine posts are returned. If I open "https://www.facebook.com/facebook/posts" from my public instance (https://feed.eugenemolotov.ru), it will return redirect to login page.

Hi @em92 ,

If you remove the /posts part of the url then you don't get the login page show, even on a public instance.

eg

https://www.facebook.com/facebook/posts (redirect to login page)

https://www.facebook.com/facebook (no redirect, page content shown).

Did you try that?

ghost commented 3 years ago

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

tstanbur commented 3 years ago

@tstanbur I have the same problem than @Noutladeesse I dont understand how I can modify https://www.facebook.com/facebook/posts to https://www.facebook.com/facebook, I have an rss feed without facebook inside.

I have the same issue too!

I was just trying to help fix it, hopefully @em92 can (I think he's the author?)

ghost commented 3 years ago

@tstanbur understand, my english is too poor. :)

Noutladeesse commented 3 years ago

@tstanbur understand, my english is too poor. :)

@cborne : @tstanbur a le même problème que nous bien que ses RSS feeds ne soient pas de feeds de Facebook, il demande si @em92 est l'auteur et s'il peut nous aider à résoudre le problème (je traduis !)

ghost commented 3 years ago

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

Noutladeesse commented 3 years ago

@Noutladeesse merci j'ai fini par comprendre par la suite, au départ je ne comprenais pas ce que faisaient les urls en facebook au milieu mais il s'agit d'une proposition de correction pour @em92. L'anglais c'est pas vraiment comme le vélo, quand tu le pratiques pas ça revient pas tout seul. :)

:-D Oui c'est une proposition de correction, mais ça ne marche pas pour les feeds déjà créés.

woj-tek commented 3 years ago

Just another small "me too". I'm running RSS-Bridge on my personal VPS (only user) since a long while (~2 years) and I'm also affected by the issue. It started about 1-2 weeks ago, then it started working on Monday and was ok for about 2 days and now it stopped again.

It does seem like an Facebook action to block RSS-Bridge (probably with their silly reasoning that this would somehow make the people go back to using their awful service…)

RealDutchie commented 3 years ago

Here just one more ''me too''. I specifically signed up here on Github to ask a few things about the Facebook bridge. Until last week, I had been using a public host from Eugene Molotov to my full satisfaction for about a year (thanks a lot). I don't have any technical background, so it is sometimes difficult for me to be able to keep up with all the terms that come up with this topic here.

I wonder if the above and below option mentioned by em92 still works and how I could get it running on my own PC:

Deploy RSS-Bridge on your personal PC or laptop and use FacebookBridge from there.

I would be very happy if I could still use the Facebook Bridge in this way, but I am not sure if this still works and how to install it on my own PC. I have looked through github quite a bit, but unfortunately I can't figure it out myself, which is why I decided to sign up.

If users could confirm or deny that this feature still works, I would be happy with that. Then my next question would be how I can best put the bridge on my own PC or who I can ask for help or get information how to do so. I also think it would be a very good idea to start a donation fund to get a developer to maintain the facebook bridge and make also the instagram bridge work again. That way, we can all contribute to get our beloved feeds going again. Greetings from the Netherlands and thanks for your great work over the years!

hellmachine2000 commented 3 years ago

Same here. Since April I got different errors in the same Feeds, like:

"Facebook Bridge | Main Site was unable to receive or process the remote website's content! Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge."

"Facebook Bridge | Main Site was unable to receive or process the remote website's content! Error message: `The requested resource cannot be found!"

"Facebook Bridge | Main Site was unable to receive or process the remote website's content! Error message: Call to a member function children() on null Query string: action=display&bridge=Facebook&u=hyperlitemountaingear&media_type=all&limit=1000&format=Atom Version: dev.2020-11-10" Latest version of RSS-Bridge…

ghost commented 3 years ago

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

Noutladeesse commented 3 years ago

I've been having these errors as well and I found that changing the cache_timeout parameter in FacebookBridge seems to reset the bridge, but it only works for a little while. I've tried 86400, 43200, 21600, 1, 0, and even eliminating the parameter. Somehow resetting the cache every time the bridge is called might be the solution to this problem?

Thank you for suggesting @Mthmgcn05 How do you reset the cache? (I am not an IT professional, only a user)

ghost commented 3 years ago

After more testing and thought, it may be every time I redeployed, it worked for five minutes, so that could have been resetting it.

miwcz commented 3 years ago

It's seems that adding cookie "c_user=XXXX" where XXXX is my ID from Facebook cookie helped. I don't know how to add this only via Bridge, so I did it via contents.php for all requests, which is really bad, but... maybe it's the way for better solution :-)

EDIT: False alarm, not working again...

em92 commented 3 years ago

@miwcz on my public instance I used c_user and xs values. Quick and dirty patch looks like this:

diff --git a/bridges/FacebookBridge.php b/bridges/FacebookBridge.php
index c03de4e..fafeabd 100644
--- a/bridges/FacebookBridge.php
+++ b/bridges/FacebookBridge.php
@@ -174,6 +174,8 @@ class FacebookBridge extends BridgeAbstract {
        } else {
            $header = array();
        }
+       $header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+

        $touchURI = str_replace(
            'https://www.facebook',
@@ -560,11 +562,15 @@ EOD;
                $header = array();
            }

+           $header[] = 'Cookie: c_user=xxxx; xs=yyyy;';
+
+
            $html = getSimpleHTMLDOM($this->getURI(), $header)
                or returnServerError('No results for this query.');

        }

        // Handle captcha form?
        $captcha = $html->find('div.captcha_interstitial', 0);

So far, so good.

em92 commented 3 years ago

So far, so good.

I meant it is working on my instance at the moment.

em92 commented 3 years ago

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

miwcz commented 3 years ago

I have 20+ facebook feeds and this is working only for 4-5 first requests. It seems that facebook is blocking mutliple requests after short while.

Noutladeesse commented 3 years ago

@tstanbur

hopefully @em92 can (I think he's the author?)

I am not author of this bridge. I maintain RSS-Bridge in general (reviewing pull requests, pinging bridge maintainers in issues) and bridges for Pikabu and Vk.

Usually maintainer of the bridge does fix bugs, but we don't have maintainer for Facebook bridge. I have little time to fix bugs in bridges, that I don't maintain.

Is there anyone who maintains Facebook bridge? @em92

em92 commented 3 years ago

I meant it is working on my instance at the moment.

Now it does not. Facebook disabled my account 'cos my account violates it's community standards. It pursuaded me to upload my photo (I did it, the real photo of me) and now I am waiting for reviewing.

em92 commented 3 years ago

@Noutladeesse

Is there anyone who maintains Facebook bridge?

No.

pin-grid-array commented 3 years ago

I don't have any new information to add that other users haven't already discussed. I'm only here to say that it is happening to me too. I am running FB Bridge on Heroku and using Feedly to save the feeds. I started getting Bridge returned error 500! around the beginning of April.

Some feeds only get the error occasionally. Other feeds keep getting the error constantly, which makes those feeds useless.

Example error message:

Facebook Bridge | Main Site was unable to receive or process the remote website's content!
Error message: `You must be logged in to view this page. This is not supported by RSS-Bridge.`
Query string: `action=display&bridge=Facebook&context=User&u=[REDACTED]&media_type=all&limit=-1&format=Atom`
Version: `dev.2020-02-26`

    Press Return to check your input parameters
    Press F5 to retry
    Check if this issue was already reported on GitHub (give it a thumbs-up)
    Open a GitHub Issue if this error persists

teromene, logmanoriginal
em92 commented 3 years ago

Here is final of my story, where I tried to make FacebookBridge work in my public instance (https://feed.eugenemolotov.ru) using my account with real phone number and patch from https://github.com/RSS-Bridge/rss-bridge/issues/2047#issuecomment-817099508.

r1zbxfG

I didn't read their paper with title "Community standards", but it looks like it sharing posts via RSS does not follow it.

woj-tek commented 3 years ago

It most likely boils down to "overusing API" or "harvesting data". Which just a tad silly.

For me using RSS-Bridge (and RSS in general) it to avoid having facebook account and be able to follow some websites that for some twisted reason are being present only there...

ghost commented 3 years ago

So I think I may have possibly discovered the issue.

I have put rss-bridge on my own server on my computer and realized after putting in the switch for debugging that Facebook responds in two different ways in the url header:

https:\/\/www.facebook.com\/login\/?next=https%3A%2F%2Fwww.facebook.com%2FXXXXXXXXXXXXX%2Fposts

and

https:\/\/www.facebook.com\/XXXXXXXXXXXXX\/posts?_fb_noscript=1

where XXXXXXXXXXXXX is the page id and the first gives the error:

'You must be logged in to view this page. This is not supported by RSS-Bridge.'

The second response gives the posts as requested.

I'm no programmer, but if one could clean up the first response to make it look like the second and then continue onto the rest of the code, I think the bridge would work.

em92 commented 3 years ago

@Mthmgcn05 I already mentioned this facebook behavior in first message of this issue.

ghost commented 3 years ago

Ha! It's been so long since I looked at the beginning, I forgot where we began.

10362227 commented 3 years ago

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

pin-grid-array commented 3 years ago

timeline RSS works fine, it already run 1 week, refresh every 6 minutes. maximal use the resource

@10362227 Could you explain further? How do you get it to work without errors? Is it because you are running it on your home computer?

ghost commented 3 years ago

After redeploying changes to figure out what works, Heroku, which is what I use, gives RSS-Bridge a new IP address, and each time that works for about five minutes. I use Inoreader to request updates and this requests so many times that it gets blocked again. I think there needs to be a system of queuing requests within a certain time frame that doesn't get the IP address blocked. And this may fix other bridges if we can figure out how often is too often and how to queue requests. At https://developers.facebook.com/docs/graph-api/overview/rate-limiting/ there is a description of rate limiting of 200 calls per hour and that's using Facebook's own API. I'm sure if the rate of requests exceeds that for so long a time, an IP address gets blocked. That's one every 18 seconds.

10362227 commented 3 years ago

i wrote a simple script for myself, it grabs FB homepage timeline (https://www.facebook.com/?sk=h_chr). but you need an account first, then follow some people

floviolleau commented 3 years ago

Hi,

I did the changes mentionned in #2047 (comment)

but nothing works and error occured at line 588 because the array is empty ($html->find('#pagelet_timeline_main_column') returns empty array).

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

Any ideas why?

What I tried as well is to echo $html just before and commenting the returnServerError.

So like this:

if($loginForm != null) {
    //returnServerError('You must be logged in to view this page. This is not$
}

echo $html;

$element = $html
                ->find('#pagelet_timeline_main_column')[0]
                ->children(0)
                ->children(0)
                ->next_sibling()
                ->children(0);

And I got a login page when going to the rss bridge html view: https://mydomain/?action=display&bridge=Facebook&context=User&u=fondationtaraocean&media_type=all&limit=-1&format=Html

image

em92 commented 3 years ago

@floviolleau, I am not speaking French, but I think you need to accept Facebook's new terms of usage

floviolleau commented 3 years ago

Yes but I did an echo so this page is rendered inside rss bridge html view and it is not going working like this 😉. Just to say that for whatever reason, it is asking to login and the problem for me is intermittent.

woj-tek commented 3 years ago

The issue is not with the cookies per se but with FB blocking crawling. I just ran a local instance on my machine - I was able to open the /posts endpoint in any browser, then launched RRS-Bridge, put it to crawl the page every couple of seconds and to no surprise, i got the error after a couple of minutes and then I was blocking me constantly and nagging to log-in.

Logging-in (and using session cookie) won't work in the long term because because FB will just block your account like it did in @em92 case.

The only possibility I see is try to extract posts from main page (i.e. https://www.facebook.com/XXXXXXXXXXXXX/?_fb_noscript=1 instead of https://www.facebook.com/XXXXXXXXXXXXX/posts?_fb_noscript=1) but that has the following issues:

I'd say that the situation is quite dire. For me, I just looked for the interesting pages on different sites (twitter usually, as quite often they use same tool to push to various "social media") and if some weren't present elsewhere I nagged the admins to be present.

I keep my fingers crossed that maybe the EU will somehow force those platforms via legislation to be interoperable instead of darn walled-gardens :|

em92 commented 3 years ago

The only possibility I see is try to extract posts from main page

Tried on my host. It does redirect to login page.

em92 commented 3 years ago

Another idea is implementing bridge using Facebook's API. For example using this: https://developers.facebook.com/docs/graph-api/reference/post

But for that, you need to find someone who will implement it. Previous maintainers of FacebookBridge lost motivation or time to maintain it. I have written some ideas about donating here https://github.com/RSS-Bridge/rss-bridge/discussions/2063, but we need to discuss it first.

Edit: See https://github.com/RSS-Bridge/rss-bridge/issues/2047#issuecomment-822638852

simarilius commented 3 years ago

I don't think the Api would help. https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

So every Bridge has to get verification, as i don't think that you can deploy the tokens of your account ;)

em92 commented 3 years ago

"This permission or feature is only available with business verification. You may also need to sign additional contracts before your app can access data. Learn More."

Found it here: https://developers.facebook.com/docs/apps/features-reference/page-public-content-access

Thanks for mentioning it @simarilius

bullride commented 3 years ago

This is huge problem for many - I've decided to use Python and it works. Facebook are trying to prevent bots and scrapers. On Python you actually sign into your personal facebook account by using Selenium (headless browser) and from there you scrape whatever you need

Noutladeesse commented 3 years ago

Thank you @bull for suggesting.Can you elaborate a little bit? I am not a programmer but is there anything I can do through Python to solve the bridge problem?

Le jeudi 22 avril 2021 à 07:07:14 UTC+3, Bull ***@***.***> a écrit :  

This is huge problem for many - I've decided to use Python and it works. Facebook are trying to prevent bots and scrapers. On Python you actually sign into your personal facebook account by using Selenium (headless browser) and from there you scrape whatever you need

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost commented 3 years ago

I entered my Facebook cookie into the bridge and that does work to get to the right page; however, the page is under the new design and this bridge is not configured to gather information from the new design. So there's two possible solutions: either configure some rate limit and caching requests so the login screen doesn't appear without the need to have an account or the bridge needs to be reconfigured to gather information from the new design using the cookie method to bypass the login screen.

Frankytyrone commented 3 years ago

Hi, I have been having the same issue for over a month so I deployed RSS Bridge on the Docker application https://hub.docker.com/r/rssbridge/rss-bridge Downloaded Docker copied the rssbridge pull command docker and now it's working perfectly - https://docs.docker.com/get-started/

Hope that helps

woj-tek commented 3 years ago

I have RSS-bridge on my own deployment (docker doesn't matter here much) and the issue happens from time to time still...

Frankytyrone commented 3 years ago

Anyone got a fix for the facebook Rss problem?

triatic commented 3 years ago

Anyone got a fix for the facebook Rss problem?

No, Facebook have intentionally made scraping much harder than before.