JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.4k stars 703 forks source link

Timezones #92

Closed AnthonyFJGarner closed 3 years ago

AnthonyFJGarner commented 3 years ago

Unfortunately tweets are simply given the timezone ""UTC. Which is insufficient if you are linking minute by minute tweets to US Stock data (by way of example) to assess whether there is any impact on price. I realize that this is Twitter's fault and not that of snscrape; however I understand one can retrieve information as to the timezone of the tweeter. Hence one could actually accurately convert all tweets to US Est.

Has anyone attempted this?

JustAnotherArchivist commented 3 years ago

There's nothing wrong/no fault here. Twitter doesn't provide a tweeter's local time zone (and possibly doesn't even know it), but a date and time in UTC unambiguously identifies any particular moment in time. The date field of Tweet objects is actually a full datetime.datetime object with a time zone, so you can convert that to whatever you desire using astimezone, pytz, or some other package; your favourite search engine will help you with that. If you're using the --jsonl output, you will first need to convert the date string back into such an object, of course.

AnthonyFJGarner commented 3 years ago

Hmm I take your point but it just does not seem to compute that way. I have converted all the tweets to US Est but some of them very obviously are wrong. But yes I take your point about UTC being unambiguous. Oddly, some tweets when converted to Est are very obviously wrong. By way of example, they talk of the pre-opening market (which ends at the open 9.30 EST) but the tweet is timed way after that. I will try to work out why that should be the case. But yes, of course, you are right, it should be universal - all timed in relation to UTC. Sorry to bother you with another dead end. Some of the tweets quote prices which are simply out of sync.

AnthonyFJGarner commented 3 years ago

I guess the answer is people are just commenting late and quoting stale prices. Anyway thank you for your cogent and clear explanation. I am very unfamiliar with all this and grateful for your time and patience.

JustAnotherArchivist commented 3 years ago

That is likely the case, yeah. Happy to help!

brussli1 commented 3 years ago

Hi, i just started fiddling with this repo today and wanted to implement this exact thing, the easiest way i found was importing timedelta like this: from datetime import datetime, timedelta
then simply adding: tweet.date + timedelta(hours = -4), wich would fix the time in my case to the "tweet.date line." i hope this helps someone. it doesnt account for changes of summer times and stuff but it gave me what i want.