DisasterMasters / TweetAnalysis

Repository for storing the code used to analyse the tweets collected from the Twitter scraper
2 stars 3 forks source link

What % of users have geolocation; can we get what they follow? Can we geolocate via who they tweet at? #8

Open audrism opened 5 years ago

TheHashTableSlasher commented 5 years ago

Twitter's API doesn't let us see who a user is following, but it does let us know who is following a user, (see here) and we can use that to categorize geolocations of users. But we need a list of users that are

audrism commented 5 years ago

Continue on importing users, figure out social network import, Nick will write tabout all geoloc methods, Jullian will look at how to separate local news/utilities/etc that may help geoloc

audrism commented 5 years ago

This update on mongodb on various methods to do geolocation

https://docs.google.com/document/d/1dasK5cKIfsbuNArM-GUu1dnqXLIxDWWJnUtMnJHqfMY/edit?usp=sharing

More on geolocation collections

TheHashTableSlasher commented 5 years ago

This is the result of running a quick script to see how much of what there is:

Geolocations_Irma: 108431 entries (19.49 of Statuses_Irma_A)
Geolocations_Maria: 99473 entries (13.72 of Statuses_Maria_A)
Geolocations_Florence: 7910 entries (31.87 of Statuses_Florence_A)

status_coordinates: 456 entries (0.21 of total)
status_place: 0 entries (0.00 of total)
status_streetaddress_nlp: 727 entries (0.34 of total)
status_streetaddress_re: 2599 entries (1.20 of total)
status_streetaddress_statemap: 57505 entries (26.65 of total)
user_place: 84362 entries (39.09 of total)
user_streetaddress_nlp: 185 entries (0.09 of total)
user_streetaddress_re: 5709 entries (2.65 of total)
user_streetaddress_statemap: 64271 entries (29.78 of total)

I accidentially put user_place and status_place under the same tag, hence why it is zero for status_place. Other than that, much of what I said before on Slack about geolocations in statuses applies for user profiles, although the amount of information we got from the "*_place" field seems much higher. This is both good and bad; Twitter tries to find an exact city for this field if possible, but I'm pretty sure that users can technically put whatever they want here.

TheHashTableSlasher commented 5 years ago

Also, once we add the following_ids and followers_ids to Users_Labeled, we can add a new method for that and re-run the script.

audrism commented 5 years ago

Describe how to add coordinates to an arbitrary collection and how to estimate how long will it take