fennb / phirehose

PHP interface to Twitter Streaming API
709 stars 189 forks source link

PhireHose API too slow, and sometimes doesnt stream. Gateway Timeouts! #92

Closed bhupinder-androcid closed 8 years ago

bhupinder-androcid commented 8 years ago

Trying to use the PhireHose API. Sometimes it seems to work sometimes it doesnt. Many at times we experience Gateway Timeout for certain keywords. If you goto Twitter Fontana and search for a tweet it loads up so fast. But my client is kind of annoyed that it isnt loading for him at all.

Hes using a 10+ Mbps connection and yet it times out.

fennb commented 8 years ago

Hi there,

Gateway Timeout is a network error, so is related to either the remote server, or the client network connection itself (not PHP/Phirehose).

You should try a direct connection to the twitter streaming endpoint with curl (examples here: https://bcomposes.wordpress.com/2013/01/25/a-walk-through-for-the-twitter-streaming-api/) if you want to prove that it's definitely independent from Phirehose itself.

Cheers!

bhupinder-androcid commented 8 years ago

Yes I did try that too. But it seems to be giving some error called "User not in role". The read / write permissions are set to all in my panel.

bhupinder-androcid commented 8 years ago

Yeah so your saying that the timeout is because of the Remote Server at Twitter?

fennb commented 8 years ago

Not necessarily, it may be an intermittent network connectivity issue, but you'll need to do an elimination test.

You're not trying to run Phirehose within a web browser are you?

The twitter streaming API is a persistent connection and must be run via PHP CLI (http://php.net/manual/en/features.commandline.php), not within a web app.

There are some examples of this in action in various tutorials from around the web, eg: http://code.tutsplus.com/tutorials/building-with-the-twitter-api-using-real-time-streams--cms-22194

DarrenCook commented 8 years ago

Yes I did try that too. But it seems to be giving some error called "User not in role"...

I don't remember ever seeing that error (*), nor one about gateway timeouts.

In addition to Fenn's suggestion of using curl, I'd try creating a another test Twitter user, and try it from the same server. Also create a quick ec2 instance (Phirehose works fine on the t1.micro and t2.micro instances) and do a test there with the original user. (Make sure each user is only being used no more than once at a time, of course.)

*: Of course it could be some change on the Twitter side. They do seem to getting increasingly desperate to monetize their service.

Off-topic, but I wonder if at some point they will start charging for the streaming API. (Or if that breaks some earlier "free forever" marketing promise, then they might shut it down and replace it with a new paid-for API.)

bhupinder-androcid commented 8 years ago

@fennb @DarrenCook Yes the Twitter API that I am testing is done using a web app through a website. You can check it out with the link below. I even tried it using PHP CLI, however there it seems to give that "User not in role" error. The issue is that my client is checking Twitter Fontana and comparing that to our application for the speed test.

I even tried it with an EC-2 t2 micro instance on AWS, but nothing seems to work. It looks like randomly the twitter server endpoint goes down and does not fetch any streaming tweets. At times say if I just use one keyword mark, it gives a gateway timeout. However if many keywords like soccer, football, league etc are used some results show up.

You can check out the demo as follows: http://twitterfontana.blitsgoa.com/admin/ username: admin password: admin

Two options are there- search by keywords and search by hashtag. If you are searching for tweets for multiple keywords, use comma separated values like mark, soccer, clubbing for "User type search" and #mark, #soccer, #football for "hashtag type search". I would really appreciate if you could guide us in the same.

fennb commented 8 years ago

You cannot use the filter-track.php (or any other Phirehose function) from within a web appliction directly.

You need to run Phirehose from the CLI, and if you want to access tweets from a web interface, you need to save the stream to a file or database, and then access them from there.

This is (somewhat) explained in the concepts section of the documentation: https://github.com/fennb/phirehose/wiki/Introduction#concepts

If you just want to search tweets, you should use the twitter search API, not the streaming API: https://dev.twitter.com/rest/public/search

Cheers!

bhupinder-androcid commented 8 years ago

Hi @fennb , so what we are doing is, when search is hit with the keywords, it calls the filter-track.php file and tweets starts streaming, and as this is happening we keep inserting those records in the database that show up when the page has finished loading completely. Isn't this approach fine?

When the tweets stored in the database show on the browser, a + and - button is there in the table view display to set or unset the row in the database stating that the tweet will be shown on the front end.

For a detailed understanding this is what we are trying to achieve through the mockup.

http://cantylever.com/clubbing_twitter/

fennb commented 8 years ago

Hi there,

Sadly no, this is not the correct approach. You cannot run any streaming code from within your webserver. The reason is, a streaming connection is designed to last forever, in fact, you will get banned from the Twitter API if you keep connecting/disconnecting (which is possibly what's happening).

You need to have a separate CLI script running in the background that maintains the connection to the Twitter streaming API and inserts tweets into the database, and then use the web interface to read them back out (and reconfigure what keywords it's watching if required).

Basically, you should never have Phirehose.php included in any part of your web application, only in CLI scripts.

bhupinder-androcid commented 8 years ago

Okay but in that case how do i filter based on the keywords that i search for in my application.

bhupinder-androcid commented 8 years ago

Like I want to search for keywords i typed in the textbox and query the API and store only those records in db that have those keywords that I typed in.

bhupinder-androcid commented 8 years ago

Im assuming real time posting of data to the CLI script running in the background isnt possible right?

DarrenCook commented 8 years ago

Like I want to search for keywords i typed in the textbox and query the API and store only those records in db that have those keywords that I typed in.

The streaming API is for monitoring a set of keywords 24/7. If the set of keywords you are interested in change regularly, and you only need to follow each keyword for a short amount of time, then you should be using a polling API instead. (Phirehose cannot help there, but there are plenty of other Twitter PHP libraries that can.)

If you are gradually building up a set of keywords to monitor 24/7, then Phirehose and the streaming API are the right tool, but the key question then becomes how frequently words are added. Because each time you add a keyword you have to kill your streaming process, and login again. And if you login too frequently, Twitter get upset and give you error codes.

So, assuming you only add new keywords every hour or so. Store the desired keyword list in a DB (or csv file). Run Phirehose as a cli script, that polls that DB every N seconds, and if there has been a change, kill your connection and reconnect. (Have some check so if it has restarted recently, that it stops polling until e.g. at least 30 minutes has passed.)

(BTW, if you add keywords more frequently than every hour, you have an engineering problem to solve. I'd create multiple twitter accounts, and round robin them. E.g. if you imagine a new keyword being added every 1 minute, considering creating 30 twitter accounts, and a farm of 30 phirehose instances running.)

(And just to re-iterate Fenn's point: never run phirehose from a web interface. It always has to be a cli.)

Darren

Darren Cook, Software Researcher/Developer My new book: Data Push Apps with HTML5 SSE Published by O'Reilly: (ask me for a discount code!) http://shop.oreilly.com/product/0636920030928.do Also on Amazon and at all good booksellers!