DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!
MIT License
434 stars 64 forks source link

STREAM data & REST data don't match #62

Closed TestingGround00 closed 4 years ago

TestingGround00 commented 4 years ago

Hi, first of all thank you very much for making tools like this. I have a question, we collected tweets back in june using STREAM method. Obviously, all the retweet count, favorite counts etc. were 0 because we collected tweets live.

In our STREAM data, we have a user.id_str = 769927759791394816 shooting id_str = 1276362330155204608 the tweet text as follow (truncated to prevent harm):

"...The political Plandemic is the Dems method of creating panic and fear mongering."

the time when this tweet was collected was created_at= Fri Jun 26 03:51:07 +0000 2020

After we collected our tweets, on our separate database we only kept original tweets- like quoted text, and tweet text. The example above has FALSE next to both retweetedand is_quote_status, which mean this was an original tweet.

We used hydrator tool, and we wanted to see how tweets would change over time. So last week we took all the tweet id_str and wrote them to a text file. We then hydtrated those text files to see how a tweet (if not deleted) aged over time. BUT the problem is after the hyradtion, the user.id_str becomes 1221505866203062272 and the tweet text changes (while the id_str = 1276362330155204608, which is same as above), the text follows:

"RT @....: what the ..... strawberry [image link removed]"

And the created label says created_at= Fri Jun 26 03:51:07 +0000 2020

The web version of the tweet matches to what we got out of hydrator.

We can't figure out what is going on. I hope I can get some assistance here. I thought the tweet id_str suppose to be unique. So how did user and the tweet text change around same second. Even if the tweets are deleted, can twitter assign same id_str to other tweet same second??

Another interest thing is that only few tweets match to some of the texts we have, majority of tweet text and users got changed after hydrating.

I can send you email if you require the tweet data. For privacy reason I am only sharing the tweet id_str above. So I thought DocNow might now something.

~Thanks

edsu commented 4 years ago

Hi @TestingGround00. To my knowledge user.id_str is unique and guaranteed to be stable for a given tweet, so there is probably some confusion about the data.

I just hydrated tweet id 1276362330155204608 using both twarc and Hydrator and they returned a tweet that was sent by user id 1221505866203062272.

Tthis tweet was a retweet of a tweet with id 1275844665481277440 which was sent by user id 1209098377088110592.

Is it possible you have an error in how you were recording the user_id? I don't see any mention of 769927759791394816 in the JSON data for the original tweet or the retweet.

TestingGround00 commented 4 years ago

Thank you @edsu for your generous time. To be fair, I and my team mates have gone to all our scripts. We spot nothing. We may need a fresh set of eye, we will let our professors know :) Appreciate the time.

edsu commented 4 years ago

I'm happy to take a look too if you need an extra set of eyes. It is curious!

TestingGround00 commented 4 years ago

Thank you for the gesture, sir but I would rather not waste your time :)