Clean Locations - Githubissues

Just updated code for clean Location.json. It was the data file created by Siyao's fetch.py

I tried on a smaller dataset with 3598 obs, in which I found:

705 of them had "State" information
430 of them had "City" information (they were not necessary to have "state" information). It was hard to deal with city. So I found 30 cities with the largest population in the US, and hardcoded them. I also included some abbreviation, such as "san francisco=sf", as the matching condition
There were 883 locations within the USA, which was about 24.5% of the data. Not too bad, ah?.

4***. I added corresponding indices all the obs, which means you can find back to raw tweets that in certain location. Hope it could be useful for you analyzing.

Chenyu-Renee / UberAnalysis

Clean Locations #2