Closed machlovi closed 2 years ago
It does sound low, but I can certainly think of cases where only a small number of tweets can be hydrated.
Where did you get the tweet IDs from? And did they go through excel at any point?
A common problem that causes this is Excel (or another program) breaks the tweet ids by turning them into floating point numbers. If you open your file of tweet IDs and see that they all end in 000
, that could be the issue.
It does sound low, but I can certainly think of cases where only a small number of tweets can be hydrated.
Where did you get the tweet IDs from? And did they go through excel at any point?
A common problem that causes this is Excel (or another program) breaks the tweet ids by turning them into floating point numbers. If you open your file of tweet IDs and see that they all end in
000
, that could be the issue.
Yes, you are right. I have Excel file and it has 00 at the end. Can you guide me how to overcome this issue.
I have excel file containing two columns: tweet id and user id. I tried to delete user id but somehow it changes tweet id ( there is no equation between them).
What format did the original data you got come from? If it's a public dataset do you have a link?
Unfortunately if the file was saved like this, the IDs are not recoverable unless you can get the non corrupt file or data from the original source.
The trick with Excel is to import the file and specify "text" data type for all ID columns when opening it. Or not use Excel at all, and use Google sheets for example.
Yes, data is public http://dfreelon.org/2012/02/11/arab-spring-twitter-data-now-available-sort-of/
It comes in excel form. I made it to work by changing the format of the table and then saving it into .txt. Now for most the ids it shows not found. Tweet ids are almost 10 year old.
What format did the original data you got come from? If it's a public dataset do you have a link?
Unfortunately if the file was saved like this, the IDs are not recoverable unless you can get the non corrupt file or data from the original source.
The trick with Excel is to import the file and specify "text" data type for all ID columns when opening it. Or not use Excel at all, and use Google sheets for example.
Yes, it is public. I have shared the link in another comment.
@machlovi - can you provide a URL of where you downloaded the data from on the web? The link in your earlier comment is a link to a file on your local computer, we can't access it at all.
@machlovi - can you provide a URL of where you downloaded the data from on the web? The link in your earlier comment is a link to a file on your local computer, we can't access it at all.
Sorry my bad , here is the link : http://dfreelon.org/2012/02/11/arab-spring-twitter-data-now-available-sort-of/
Thanks, I see the problem, the original file is a 2 column CSV.
After extracting the 1st column as a text file using csvcut
command from https://csvkit.readthedocs.io/en/latest/index.html it worked for me:
csvcut --columns 1 libya_ids.csv > libya_tweets.csv
(Unfortunately this isn't very user friendly and requires the command line - maybe the hydrator app can complain about invalid formats or something?)
I ran this for a short while and in the first couple of thousand results roughly 50% of tweets were missing (either deleted or suspended or made private etc), which is very high but also somewhat expected as tweet results decay very quickly.
https://arxiv.org/abs/1209.3026 "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?" paper is a good reference / read on this problem.
@igorbrigadir I really appreciate your help. Some one had already warned me about this issue. I have no idea what to do next. I have contacted the researcher but its against the twitter policy to share text data.
Hi, I have tweet_id folder of 86k but when I fed it ti hydrater, it only return 5000 tweets. Is it normal ? or I am just facing this issue.