Isaac1o / CDFW

Repository for MSDS practicum for California Department of Fish and Wildlife
2 stars 0 forks source link

Query Data from 2010 to Present for Grids - Feb 28 2022 #12

Open danielleasavage opened 2 years ago

danielleasavage commented 2 years ago

@Isaac1o

Isaac1o commented 2 years ago

Using paginator to pull tweets from multiple pages. However, Tweepy's paginator does not return user or geo data. I created a .py similar to Tweepy's paginator source code but altered it to return user and geo data.

Isaac1o commented 2 years ago

Querying data from 2010 - now is taking a long time. There ~50 requests per grid. 50 requests x 4000 grids = 200,000 requests. With the capability to only send 300 requests per 15 minutes this would take about 7 days to query all the data...

I could either run this query with an ec2, increase grid size, or reduce dates searched. Please advice @danielleasavage

Isaac1o commented 2 years ago

Started EC2 instance to pull data from twitter's API. Using screen to keep the session running and piping the output to output.txt.