digitalmethodsinitiative / dmi-tcat

Digital Methods Initiative - Twitter Capture and Analysis Toolset
Apache License 2.0
367 stars 114 forks source link

Use `str_id`instead of `id` #75

Closed joelgombin closed 10 years ago

joelgombin commented 10 years ago

It looks like it is recommended to use the id_str field from the JSON tweet payload rather than the id field, which being a large integer (>53 bits) is prone to approximation error (see https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake). Unless I'm mistaken, though, tcat uses the id field (at least for the streaming part, I haven't checked the search script).

In any case, I've encountered several cases where the tweet ID used by tcat is incorrect.

dentoir commented 10 years ago

Are you using the most recent version of TCAT? We've encountered this problem in the past but we are supposed to have fixed it months ago. How recent are your records with truncated Tweet IDs?

in capture/common/functions (in the Tweet class, fromJson()): $this->id = $data["id_str"];

(this code is used by the streaming code as well as the search code)

joelgombin commented 10 years ago

OK I know where my mistake was. I was looking at the fromGnip class rather than the fromJson. Sorry about that!