RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.21k stars 1.03k forks source link

Why Instagram bridge does not work and possible solutions #1891

Open em92 opened 3 years ago

em92 commented 3 years ago

Recently there are a lot of new issues and comments about instagram bridge throwing 429 errors https://github.com/RSS-Bridge/rss-bridge/issues/1863 https://github.com/RSS-Bridge/rss-bridge/issues/1885 and even workaround from https://github.com/RSS-Bridge/rss-bridge/issues/1617#issuecomment-646679996 does not help.

First of all, I want to clarify, that there is no maintainer of InstagramBridge. Recent commit https://github.com/RSS-Bridge/rss-bridge/commit/56b2c516e49b26b26258fe11787518cae5737b10 had to be pushed long time ago. By maintainer I mean person, that at least fixes bugs or reports why it is certain bug is not fixable at the moment and comments on InstagramBridge PRs.

429 error means "Too many requests". It means that Instagram servers receive a lot of requests from your server (not only rss-bridge instance). So InstagramBridge on public and popular RSS-Bridge instance will probably throw this error.

There are opinions, if we make somehow InstagramBridge to login via existing account, it won't show such errors. To prove that in practice, private credentials feature has to be implemented. There is issue for that https://github.com/RSS-Bridge/rss-bridge/issues/1170 and draft PR https://github.com/RSS-Bridge/rss-bridge/pull/1343. But @teromene won't continue working on that PR, so someone has to continue his work.

Possible solutions for users:

Note, that deploying RSS-Bridge on shared hosting probably won't help, 'cos there would be other users making requests to instagram on the same server.

itsTurnip commented 3 years ago

Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use InstagramBridge from there.

This doesn't actually work. I've deployed my own instance with password protection and with only 4 user accounts to be fetched by bridge every hour, but there is still error.

GregThib commented 3 years ago

Deploy RSS-Bridge on your VPS, make sure that only certain people use it and use InstagramBridge from there.

This doesn't actually work. I've deployed my own instance with password protection and with only 4 user accounts to be fetched by bridge every hour, but there is still error.

Same here, I'm the only one using the bridge on my proper instance (@home, not shared IP) and the problem persist.

I would like to investigate more, but the bridge is an hard piece, and I do not understand what is USER_QUERY_HASH as I mentionned in https://github.com/RSS-Bridge/rss-bridge/issues/1864#issuecomment-737345424. We surely need a developer who knows how the GraphQL facebook API works.

Solution, as you said, is to have a private authentication method, and we should wait for https://github.com/RSS-Bridge/rss-bridge/issues/1170 and https://github.com/RSS-Bridge/rss-bridge/pull/1343 A few years ago I tried to implement such thing (for a MediapartBridge) but the work is outdated now. Force to @teromene and you all! punch

oppilate commented 3 years ago

For private accounts, take a look at https://github.com/dilame/instagram-private-api

translit commented 3 years ago

429 error means "Too many requests".

I confirm. Switched my feeds to update once a day and no more error. Thanks for the tip.

Fmstrat commented 3 years ago

@em92 Maybe I'm crazy or misreading things, but I'm not sure the GraphQL endpoints are going to work for login based feeds. In the API overview they are pretty clear that Facebook business accounts connected to an Instagram account are required and that business validation is needed.

I'm guessing this is why projects like Instagram Private API went the route of using the consumer API and adding in the extra calls to look like Android for good measure. Facebook even states in their documentation that: "The API cannot access Instagram consumer accounts (i.e., non-Business or non-Creator Instagram accounts). If you are building an app for consumer users, use the Instagram Basic Display API instead."

Perhaps GraphQL could work, but a quick dive in doesn't seem to be the case. It seems more like the Instagram Bridge would need to be converted over to the consumer API. However, I'm totally new to this, so I could be way off base.

em92 commented 3 years ago

Just pushed teromene's patch to add possibility to use private credentials. Example usage is given in first message of PR https://github.com/RSS-Bridge/rss-bridge/pull/1343

So, if anyone wants to make changes to InstagramBridge to use those credentials - feel free to do it. Just for any case post message here like "I am going to do this", just to make sure that none is making same patch simultaneously.

@Fmstrat, your message could be useful to InstagramBridge maintainer, but there is none at the moment.

Fmstrat commented 3 years ago

If a maintainer comes along (or I eventually get time), a bit more research into this looks like you can use the GraphQL endpoint "privately" within a user session with a CSRF token. This is how Instaloader handles it in their JSON call. They get a token, then create the session, then they use the session, then grab the JSON with that session.

Recreating that process should do the trick.

Fmstrat commented 3 years ago

Well, that was easy. Initial PR is in: https://github.com/RSS-Bridge/rss-bridge/pull/1894

Fmstrat commented 3 years ago

Actually, hold on that PR. I figured out if I modify the storage to the cookie text, then I can fix the username/id bug, too. ;)

arkhi commented 3 years ago

If I understand what’s happening, another solution could be to throttle the requests (based on the number of feeds making requests to Instagram API) instead of falling back to logging in. Is that possible?

That way, people without an Instagram account would not be excluded.

Fmstrat commented 3 years ago

@arkhi The problem is that kind of delay needs to come from the reader, not the bridge, or the reader may timeout.

@em92 I'll be working today to do a more formal login similar to the style of Instaloader.

Fmstrat commented 3 years ago

Alrighty, I feel like I'm 95% done, but can't seem to get the sessionid back in the post-login header. Maybe someone here can help out? @em92 if I can figure this out I can probably keep this maintained, too.

Here's the changes: https://github.com/Fmstrat/rss-bridge/commit/f6258920d60c28501e2d736f122f02aa3d5f11b5

To test it out, you'll need to put your username/password in here: https://github.com/Fmstrat/rss-bridge/blob/private_insta/bridges/InstagramBridge.php#L141 (Until I integrate into the private feed option that was recently merged.

The problem is when calling the login, the sessionid variable never comes back. This could be as simple as a failed login and I'm just not seeing why. @em92 am I using the post options the way you would expect with json_encode?

In the linked file you can see on line 110 there is a sessionid cookie returned, which does not occur when I make the request in my code.

Request logs: instalogin.txt

Thanks!

Aasemoon commented 3 years ago

Has there been any updates for this issue? Any hope of a solution?

Fmstrat commented 3 years ago

@Aasemoon Not from my end, need some feedback on what seems wrong in my commit, first.

smnthermes commented 3 years ago

How about using Instaloader as a backend?

em92 commented 3 years ago

Just checked @JimDog546's solution with session id (https://github.com/RSS-Bridge/rss-bridge/pull/1894#issuecomment-815840970). As for now it is not user friendly and you should use your own instance. Here is quick and dirty patch:

diff --git a/bridges/InstagramBridge.php b/bridges/InstagramBridge.php
index bf2999b..1a1c4ac 100644
--- a/bridges/InstagramBridge.php
+++ b/bridges/InstagramBridge.php
@@ -49,6 +49,7 @@ class InstagramBridge extends BridgeAbstract {
        const USER_QUERY_HASH = '58b6785bea111c67129decbe6a448951';
        const TAG_QUERY_HASH = '9b498c08113f1e09617a1703c22b2f32';
        const SHORTCODE_QUERY_HASH = '865589822932d1b43dfe312121dd353a';
+       const SESSIONID = '';
+       const CACHE_TIMEOUT = 43200; // 12 hours

        protected function getInstagramUserId($username) {

@@ -62,7 +63,8 @@ class InstagramBridge extends BridgeAbstract {
                $key = $cache->loadData();

                if($key == null) {
-                               $data = getContents(self::URI . 'web/search/topsearch/?query=' . $username);
+                               $header = array('cookie: sessionid=' . self::SESSIONID);
+                               $data = getContents(self::URI . 'web/search/topsearch/?query=' . $username, $header);

                                foreach(json_decode($data)->users as $user) {
                                        if(strtolower($user->user->username) === strtolower($username)) {
@@ -220,12 +222,13 @@ class InstagramBridge extends BridgeAbstract {

                        $userId = $this->getInstagramUserId($this->getInput('u'));

+                       $header = array('cookie: sessionid=' . self::SESSIONID);
                        $data = getContents(self::URI .
                                                                'graphql/query/?query_hash=' .
                                                                 self::USER_QUERY_HASH .
                                                                 '&variables={"id"%3A"' .
                                                                $userId .
-                                                               '"%2C"first"%3A10}');
+                                                               '"%2C"first"%3A10}', $header);
                        return json_decode($data);

                } elseif(!is_null($this->getInput('h'))) {

In this patch you should set your own SESSIONID. To get it, you should:

PR with using https://github.com/RSS-Bridge/rss-bridge/pull/1343 is welcome. Also welcome ideas about how to make this solution user friendly.

[Edited, Apr 12, 2021. 23:19 YEKT - added CACHE_TIMEOUT]

em92 commented 3 years ago

@JimDog546 FYI. I have recently checked, why my instance stopped returning feeds. Reason is I had to accept new terms of usage or something like this.

em92 commented 2 years ago

Hey, all! Recently I have added documentation, how to setup InstagramBridge for private usage. Could you please review it? It probably works with 50 feeds and cache_timeout = 43200 (12 hours) https://github.com/em92/rss-bridge/blob/doc-instagram-2022-01/doc/bridges/InstagramBridge.rst

em92 commented 2 years ago

Merged in master, new link: https://github.com/RSS-Bridge/rss-bridge/blob/master/doc/bridges/InstagramBridge.rst

This issue should be stayed open at least until "content donoring" method is described. Prototype was implemented for my customer in my personal branch: https://github.com/em92/rss-bridge/tree/shi/contrib/InstagramBridge but I am not sure, if it is usable for general cases.

Bockiii commented 2 years ago

@em92 Can you move it to the correct docu folder? Just create a new folder "Bridge specific" and add an 01_Instagram.md to it. You can see how it works with the others. Its automatic, you only need to create the file and folder.

em92 commented 2 years ago

Maybe Instagram.md, not 01Instagram.md? It is assumed, that without "01" pages will be sorted by alphabet (bridge name)

Bockiii commented 2 years ago

Correct. And yes, alphabetically makes more sense in this case, so keep it as "Instagram.md".

em92 commented 2 years ago

New link: http://rss-bridge.github.io/rss-bridge/Bridge_Specific/Instagram.html

tvqt commented 2 years ago

heads up (in case anyone else has the same experience)- I set this up yesterday on a private Heroku instance, and woke up to find that Instagram had locked the account, thinking that I had been the victim of a phishing attack! They made me reset the password, but otherwise, and I had to change the session_id, but it works okay now :-)

em92 commented 2 years ago

FYI, @tvqt, some time ago I configured session_id and ds_user_id on my public instance https://feed.eugenemolotov.ru. I had approximately 400+ unique instagram username queries and 1 day cache timeout.

In 2 days I got temporary ban, until I verified my phone number. Same happened in two days, again temp ban. Same happend in another two days, again temp ban. Finally after another two days I got permaban.

As for now in my public instance, I am testing the other method of fetching instagram feeds. RSS-Bridge pushes task to queue, browser with userscript and logged in instagram user gets this task, fetches data from instagram and pushes back to RSS-Bridge. When it is ready, I will make PR. Here is branch with those changes if someone is interested. https://github.com/em92/rss-bridge/commits/instagram-rabbitmq (don't mention "rabbitmq" in the branch name, it won't be used).

UPD: Nov 4, 2022. new branch https://github.com/em92/rss-bridge/tree/instagram-jq

tvqt commented 1 year ago

@em92 thanks for your reply (and the work you have done!). I can't seem to get the instagram-rabbitmq branch you linked working on my own instance - getting it set up the same way as the original version, searching for a user's profile returns:

Uncaught Exception Error: Call to undefined method InstagramBridge::saveCachedValue() at bridges/InstagramBridge.php line 145

0 index.php:7

1 lib/RssBridge.php:15

2 lib/RssBridge.php:59

3 actions/DisplayAction.php:136

4 bridges/InstagramBridge.php:154

5 bridges/InstagramBridge.php:309

6 bridges/InstagramBridge.php:145

Query string: action=display&bridge=InstagramBridge&context=Username&u=elonmusk&media_type=all&format=Html Version: dev.2022-06-14 OS: Linux PHP version: 8.1.10

Quite odd!

Your instance works, however, I can't seem to see the queuing message ("RSS-Bridge pushed job to retreive data. Meanwhile you can add feed link to your feed reader. Posts will appear when job is done") in the rabbitmq repository- is the instance running on a different version of the branch?

em92 commented 1 year ago

Call to undefined method InstagramBridge::saveCachedValue

There is a typo. Just pushed a fix to my branch.

Also you may need to add this in config.ini.php.

[JobQueue]
file = ./jobqueue.sqlite3
deletosh commented 1 year ago

@em92 is this on the main branch now?

Specifically, does this version on docker hub have it, https://hub.docker.com/layers/rssbridge/rss-bridge/sha-ecd717c/images/sha256-9ae8ce8b2b6cb6766031681f2af7f9f2d6d32d373acc8a9e3d72061d7e9331fe?context=explore ?

em92 commented 1 year ago

Hey, @deletosh ! No it isn't. As for now, I have no motivation to restructure, tidy up the code and write documentation.