Jefferson-Henrique / GetOldTweets-python

A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.
MIT License
1.35k stars 809 forks source link

Geolocation #48

Closed biancaglez closed 4 years ago

biancaglez commented 7 years ago

Hello,

Would you happen to know if your program returns a lat long (geolocation) of the twitter posts? So far I have yet to have any strings returned under geo, but perhaps it is due to what I am searching.

SamOh commented 7 years ago

I am also wondering the same thing. The geo attribute of each tweet always comes out empty. Look at issue #45.

Update: The query in tweetmanager.py for geo is incorrect– span.Tweet-geo anywhere in the html (checked tweet, tweetPQ, tweetHTML, tweet, and even the basic JSON file). Currently looking at how the JsonResponse is rendered to see if we can include information about location as well.

Further Update: Looks like there is no way to get lat/long data of tweets with this method (please correct me if I'm wrong). I just changed the code so that the geo parameter of Tweet returns the location of the user that tweeted the tweet from their profile (I find the user based on the tweet, then scrape the user's location posted on the profile). This isn't as accurate or as encompassing as getting lat/long info, but this is the best I could do and works for my purposes.

fragrusti commented 7 years ago

Dear Sam,

could you please send me your modified version? I would also need user sex.. is it possible to add? Unfortunately I have no experience in scraping and in python to do it on my own.. :-(

Thank you in advance, Francesco

mgglenn commented 7 years ago

@SamOh Hey Sam! Wondering if you could provide the updated files to get around the geocode bug of the original version? Checked out your profile and had trouble finding the changes on there.

Thanks in advance! Grace

SamOh commented 7 years ago

Yes I can provide the info! Most of the changes I made are in the "getoldtweets" folder of my trend_map project.

However, I had put a pause on this project a couple months ago because overnight all of the code in Jefferson-Henrique's unofficial API stopped working– my intuition based on a few tests I ran is that twitter updated its software so its no longer compatible with the unofficial API. I may be wrong though– if it works for you please let me know!

Jefferson-Henrique commented 7 years ago

Hello guys, it seems twitter has changed the info it provides, I no longer see the location info that I retrieved before with "span.Tweet-geo".

vinodhinir commented 7 years ago

Hi

I am not able to scrap historical tweets. The --since and --until arguments are not working. Jeff can you please confirm.

arpan-ghosh commented 7 years ago

@SamOh I'm able to retrieve old tweets using Jefferson-Henrique getoldtweets. Obviously, as we encountered earlier, the geolocation part doesn't work. But I was wondering if you got around to fixing scraping the user's location posted on their profile? I tried running whats in your trend_map project and I get some errors.

`Arpans-MacBook-Pro:getoldtweets MacbookPro$ python3 Exporter.py --querysearch "trump" --since 2016-10-01 --until 2016-10-31 Searching...

Twitter weird response. Try to see on browser: https://twitter.com/search?q=%20since%3A2016-10-01%20until%3A2016-10-31%20trump&src=typd Unexpected error: <class 'urllib.error.URLError'> Done. Output file generated "output_got3.csv". Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect server_hostname=server_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket _context=self, _session=session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in init self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake self._sslobj.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/MacbookPro/PycharmProjects/cs446/trend_map-master/getoldtweets/got3/manager/TweetManager.py", line 146, in getJsonReponse response = opener.open(url) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open response = self._open(req, data) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open '_open', req) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open context=self._context, check_hostname=self._check_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "Exporter.py", line 76, in main got3.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer) File "/Users/MacbookPro/PycharmProjects/cs446/trend_map-master/getoldtweets/got3/manager/TweetManager.py", line 34, in getTweets json = TweetManager.getJsonReponse(tweetCriteria, refreshCursor, cookieJar) File "/Users/MacbookPro/PycharmProjects/cs446/trend_map-master/getoldtweets/got3/manager/TweetManager.py", line 153, in getJsonReponse sys.exit() SystemExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "Exporter.py", line 85, in main(sys.argv[1:]) File "Exporter.py", line 78, in main except arg: TypeError: catching classes that do not inherit from BaseException is not allowed`

Jefferson-Henrique commented 7 years ago

Hello @arpan-ghosh, can you try with python2? The python3 it is kind of experimental.

labeebee commented 6 years ago

I used Python 2 and the geo attribute returns an empty string.

rax87 commented 6 years ago

I would also like the geo locations for tweets. Has anybody managed to get this working?

rahulha commented 6 years ago

Apparently there is no Geo/Location field returned by the method this project uses. The json returned by twitter has limited info. Your best bet is to collect all Twit IDs and query Twitter API to get more insights. I am working on the same, once successful I'll update the code or may be just post it here.

TheSaintIndiano commented 6 years ago

The following https://github.com/taspinar/twitterscraper repository gives everything (geo-location included). Happy Scrapping. :)

rahulha commented 6 years ago

Hey @TheSaintIndiano , I reviewed your code and even tried it on local. I dont see Geo for the tweets. Although there is Geo for Users, but that is not something we are looking for. Can you please explain where and how the geo code is being populated?

TheSaintIndiano commented 6 years ago

I meant one can filter out tweets written based on the location. eg. twitterscraper "Blockchain near:Seattle within:15mi" -o blockchain_tweets.json -l 1000 Hope it helps.

rahulha commented 6 years ago

What we are looking for is the geo-location of the tweet. The program already has mechanism to restrict tweets based on location exactly same way you have in your program.

ex. Exporter.py --querysearch "Blockchain" --near Seattle --within 15mi

Emekaborisama commented 4 years ago

can i get the geo location in a separate columns

tasvora commented 4 years ago

Hi, did anybody get the geo location of Tweet working?

iqraakhtar7 commented 4 years ago

Hi. Does anybody get a solution to geo location? It is returning empty string.

iqraakhtar7 commented 4 years ago

Hi, did anybody get the geo location of Tweet working?

Yes I can provide the info! Most of the changes I made are in the "getoldtweets" folder of my trend_map project.

However, I had put a pause on this project a couple months ago because overnight all of the code in Jefferson-Henrique's unofficial API stopped working– my intuition based on a few tests I ran is that twitter updated its software so its no longer compatible with the unofficial API. I may be wrong though– if it works for you please let me know!

Hi Sam, did you get any solution for geo values?

YukunYangNPF commented 3 years ago

Following this thread. Does anyone have the luck to get the geospatial data of tweets?