Open audrism opened 5 years ago
Continue on importing users, figure out social network import, Nick will write tabout all geoloc methods, Jullian will look at how to separate local news/utilities/etc that may help geoloc
This update on mongodb on various methods to do geolocation
https://docs.google.com/document/d/1dasK5cKIfsbuNArM-GUu1dnqXLIxDWWJnUtMnJHqfMY/edit?usp=sharing
This is the result of running a quick script to see how much of what there is:
Geolocations_Irma: 108431 entries (19.49 of Statuses_Irma_A)
Geolocations_Maria: 99473 entries (13.72 of Statuses_Maria_A)
Geolocations_Florence: 7910 entries (31.87 of Statuses_Florence_A)
status_coordinates: 456 entries (0.21 of total)
status_place: 0 entries (0.00 of total)
status_streetaddress_nlp: 727 entries (0.34 of total)
status_streetaddress_re: 2599 entries (1.20 of total)
status_streetaddress_statemap: 57505 entries (26.65 of total)
user_place: 84362 entries (39.09 of total)
user_streetaddress_nlp: 185 entries (0.09 of total)
user_streetaddress_re: 5709 entries (2.65 of total)
user_streetaddress_statemap: 64271 entries (29.78 of total)
I accidentially put user_place and status_place under the same tag, hence why it is zero for status_place. Other than that, much of what I said before on Slack about geolocations in statuses applies for user profiles, although the amount of information we got from the "*_place" field seems much higher. This is both good and bad; Twitter tries to find an exact city for this field if possible, but I'm pretty sure that users can technically put whatever they want here.
Also, once we add the following_ids and followers_ids to Users_Labeled, we can add a new method for that and re-run the script.
Describe how to add coordinates to an arbitrary collection and how to estimate how long will it take
Twitter's API doesn't let us see who a user is following, but it does let us know who is following a user, (see here) and we can use that to categorize geolocations of users. But we need a list of users that are