adamwynne / twitter-api

Async io interface to all the twitter APIs
372 stars 64 forks source link

Response chunks don't map to streaming responses #47

Closed kelseyq closed 10 years ago

kelseyq commented 10 years ago

*_EDITED TO ADD_**: I don't think this is only confined to high-volume streams. Seeing the same issue with an extremely low-volume user stream :(

When following a high-volume stream, I'm seeing a lot of responses being chunked incorrectly. Here's a reproduction case:

(defn bodypart-print-2
  [response baos]
  (println (str "#### (" (.size baos) ") " (.toString baos))))

(def ^:dynamic
     *custom-streaming-callback*
     (SyncStreamingCallback. bodypart-print-2
                    (comp println response-return-everything)
                  exception-print))

  (statuses-filter :params {:track "bieber", :delimited "length"}
         :oauth-creds my-creds
         :callbacks *custom-streaming-callback*)

I turned on the delimited param to compare the length of the returned chunk to the expected length of the response. That seems like a route to fixing it but I'm reallll new to Clojure so I'm not totally sure where to start.

kelseyq commented 10 years ago

this is what i ended up doing, library handling this would be nicer: https://gist.github.com/kelseyq/51dce2e997b393f469b5

adamwynne commented 10 years ago

can you clarify what you mean by chunked incorrectly? are you saying that the broadcast length on the stream is not the size of the baos?

On Sun, Apr 27, 2014 at 11:30 PM, Kelsey Gilmore-Innis < notifications@github.com> wrote:

When following a high-volume, stream, I'm seeing a lot of responses being chunked incorrectly. Here's a reproduction case:

(defn bodypart-print-2 [response baos](println %28str "#### %28" %28.size baos) ") " (.toString baos))))

(def ^:dynamic custom-streaming-callback (SyncStreamingCallback. bodypart-print-2 (comp println response-return-everything) exception-print))

(statuses-filter :params {:track "bieber", :delimited "length"} :oauth-creds my-creds :callbacks custom-streaming-callback)

I turned on the delimited paramhttps://dev.twitter.com/docs/streaming-apis/parameters#delimitedto compare the length of the returned chunk to the expected length of the response. That seems like a route to fixing it but I'm reallll new to Clojure so I'm not totally sure where to start.

— Reply to this email directly or view it on GitHubhttps://github.com/adamwynne/twitter-api/issues/47 .

adamwynne commented 10 years ago

Hi Kelsey

Thanks for the info, but I'm still not quite connecting the dots how this is unexpected behaviour and what the library should do about it?

Regards Adam

On Mon, Apr 28, 2014 at 5:23 PM, Kelsey Gilmore-Innis < notifications@github.com> wrote:

Tweet messages often span more than one baos chunk. This is noted by Twitter in the streaming API documentationhttps://dev.twitter.com/docs/streaming-apis/processing#Transfer-Encoding:_chunked: "Be aware that Tweets and other streamed messages will not necessarily fall on HTTP chunk boundaries."

— Reply to this email directly or view it on GitHubhttps://github.com/adamwynne/twitter-api/issues/47#issuecomment-41579575 .

kelseyq commented 10 years ago

It means you can't reliably do anything with the chunks returned; they aren't even valid JSON. I'd expect a library that only handles Twitter to be able to take a callback that processes individual tweets. I can't imagine a use case for the streaming API that wouldn't require some kind of workaround like the one I've pasted above; am I missing something?

At the very least, I'd hope it'd be mentioned in the documentation--I spent quite a while debugging this.

kelseyq commented 10 years ago

As for what to do about it, I'm not sure there's a FANTASTIC solution given that you have to keep some state around between chunks...something similar to what I did manually could be moved back into the library, though. As mentioned I'm really new to Clojure, but if you could point me to a starting point or a general idea I'd be willing to take a crack at it.

adamwynne commented 10 years ago

I haven't used the streaming endpoints in a while, but I think instead of using the delimited=length, it might be simpler to omit it, and then just keep concatenating until you break on \r\n's (which will give you valid JSON).

Also, I see in your code you're using .toString which will stop on a \0 termination char. This is dangerous as there are often null chars in tweets (unicode being one example)

I guess the idiomatic way to handle it would be a lazy sequence that populated tweets in an async callback parsing for \r\n's. I agree you'd have to keep some state, but that can be minimised (perhaps a 'buffer' or something).

This is higher level than everything else in the library, but if you want to have a crack at it, all efforts gratefully received.

kelseyq commented 10 years ago

Great, thank you for the advice! I also found this library, built on top of twitter-api: https://github.com/mccraigmccraig/twitter-streaming-client Do you think that'd be a better spot for it? (I tried it and it has the same issue.)

adamwynne commented 10 years ago

yeah, that seems perfect for it - it claims to solve these issues ;)

kelseyq commented 10 years ago

i will take my whining over there then! ;)

kelseyq commented 10 years ago

I've created a pull request with a fix in mccraigmccraig's library: https://github.com/mccraigmccraig/twitter-streaming-client/pull/2

adamwynne commented 10 years ago

Looks good

On Saturday, June 21, 2014, Kelsey Gilmore-Innis notifications@github.com wrote:

I've created a pull request with a fix in mccraigmccraig's library: mccraigmccraig/twitter-streaming-client#2 https://github.com/mccraigmccraig/twitter-streaming-client/pull/2

— Reply to this email directly or view it on GitHub https://github.com/adamwynne/twitter-api/issues/47#issuecomment-46744353 .