Chenyu-Renee / UberAnalysis

0 stars 0 forks source link

Clean Locations #2

Open z357412526 opened 8 years ago

z357412526 commented 8 years ago

Just updated code for clean Location.json. It was the data file created by Siyao's fetch.py

I tried on a smaller dataset with 3598 obs, in which I found:

  1. 705 of them had "State" information
  2. 430 of them had "City" information (they were not necessary to have "state" information). It was hard to deal with city. So I found 30 cities with the largest population in the US, and hardcoded them. I also included some abbreviation, such as "san francisco=sf", as the matching condition
  3. There were 883 locations within the USA, which was about 24.5% of the data. Not too bad, ah?.

4***. I added corresponding indices all the obs, which means you can find back to raw tweets that in certain location. Hope it could be useful for you analyzing.