afshinrahimi / geographconv

Semi-supervised User Geolocation via Graph Convolutional Networks
67 stars 18 forks source link

How do you get these user id's? #5

Open ChaitanyaBaweja opened 4 years ago

ChaitanyaBaweja commented 4 years ago

Your data uses these identifiers for users: USER_ee551c6c.

These don't correspond to the type of id's you get from twitter. How do you convert the twitter id's in this format. I am asking this because I need to augment my data to your dataset and would use a similar conversion for my data as well.

afshinrahimi commented 4 years ago

Hi,

That conversion was done for privacy and by the original publishers of the dataset, so we can't get the real handle from it.

The datasets are quite old, so I'm not sure if you'll get a lot from them. You can download data from the country you are interested in using this repo https://github.com/afshinrahimi/twitter-fetcher, set the search criteria in tweepy to download geolocated tweets from bounding box of the country you're interested in to collect many geolocated tweets. Then for each user in the downloaded tweets, download their timeline, and use the location as label. Finally, when you have enough users with locations, build your dataset.