Hi,
I'm trying to hydrate a few covid related tweets given by the dataset creators. When I hydrate the tweets and access the jsonl file, the tweet ids don't match the original ids given to hydrate the said tweets. I'm not sure why this is happening. I created a reduced set with 10 tweet ids
1409530436481687559,1420581355176480770,1415615378546466819,1425871126014615558,1409480196944760833,1396357926990667784,1397088197054697473,1415037360706834438,1422531324997521408, 1424781156554186757 and hydrated them.
The generated jsonl file has the following tweet ids and id_str respectively
1415615378546466800,1397088197054697500,1409480196944760800,1415037360706834400,1425871126014615600,1424781156554186800,1409530436481687600,14205813551764808001415615378546466816,1397088197054697472,1409480196944760832,1415037360706834432,1425871126014615552,1424781156554186752,1409530436481687552,1420581355176480768
I have a couple of questions
From what I understand id_str is the string version of id to prevent reading a long integer. I didn't open the jsonl file in excel or anything (no typecasting done). Why aren't id_str and id matching?
I have to match the extracted tweet (from jsonl) to the original list of ids provided for hydration and since they changed during the hydration process, I cannot map them back (to give you an idea, I have 880k tweet ids, out of which only 13k tweet ids were matched). Why are the tweet ids changing during the hydration process and how to avoid that?
Any help is greatly appreciated.
Thanks
Hi, I'm trying to hydrate a few covid related tweets given by the dataset creators. When I hydrate the tweets and access the jsonl file, the tweet ids don't match the original ids given to hydrate the said tweets. I'm not sure why this is happening. I created a reduced set with 10 tweet ids
1409530436481687559,1420581355176480770,1415615378546466819,1425871126014615558,1409480196944760833,1396357926990667784,1397088197054697473,1415037360706834438,1422531324997521408, 1424781156554186757
and hydrated them. The generated jsonl file has the following tweet ids and id_str respectively1415615378546466800,1397088197054697500,1409480196944760800,1415037360706834400,1425871126014615600,1424781156554186800,1409530436481687600,1420581355176480800
1415615378546466816,1397088197054697472,1409480196944760832,1415037360706834432,1425871126014615552,1424781156554186752,1409530436481687552,1420581355176480768
I have a couple of questionsid_str
is the string version ofid
to prevent reading a long integer. I didn't open the jsonl file in excel or anything (no typecasting done). Why aren'tid_str
andid
matching?