Just updated code for clean Location.json. It was the data file created by Siyao's fetch.py
I tried on a smaller dataset with 3598 obs, in which I found:
705 of them had "State" information
430 of them had "City" information (they were not necessary to have "state" information).
It was hard to deal with city. So I found 30 cities with the largest population in the US, and hardcoded them. I also included some abbreviation, such as "san francisco=sf", as the matching condition
There were 883 locations within the USA, which was about 24.5% of the data. Not too bad, ah?.
4***. I added corresponding indices all the obs, which means you can find back to raw tweets that in certain location. Hope it could be useful for you analyzing.
Just updated code for clean Location.json. It was the data file created by Siyao's fetch.py
I tried on a smaller dataset with 3598 obs, in which I found:
4***. I added corresponding indices all the obs, which means you can find back to raw tweets that in certain location. Hope it could be useful for you analyzing.