Mottl / GetOldTweets3

A Python 3 library and a corresponding command line utility for accessing old tweets
MIT License
365 stars 127 forks source link

Download stops after a lot of tweets #3

Open JaimeBadiola opened 5 years ago

JaimeBadiola commented 5 years ago

I tried to download tweets with guery-search 'bitcoin' since 2018-02-18 until 2018-02-19. The issue is that the script stoped before the end of the until parameter

The log was too big to put it all, so I deleted the log of the first 31000 tweets.

You can find the log here

Can this be because twitter detects a bot downloading a lot of tweets?

Mottl commented 5 years ago

What dates did you expect to see in output_got.csv?

Keep in mind that tweets in output_got.csv are in reversed order (latest tweets are at the beginning)

JaimeBadiola commented 5 years ago

This is the date of the first tweet

19/02/2018 0:59

This is the date of the last tweet

18/02/2018 9:03

normally it should finished at

18/02/2018 1:00

Mottl commented 5 years ago

I haven't downloaded all the tweets since there are too many, but the first row is 2018-02-18 23:59:58 as expected (--until 2018-02-19)

JaimeBadiola commented 5 years ago

So do you have any ideas about why I have this issue?

Mottl commented 5 years ago

Are you using the generic version of GetOldTweets3 or you have changed some code? Seems like timestamps of tweets in your CSV are not in UTC and thus they have 2018-02-19 date instead of 2018-02-18.

I've check with GetOldTweets3 --username barackobama --since X1 --until X2 and it works as expected. X1 is included, X2 is excluded (as in README.md)

JaimeBadiola commented 5 years ago

I have only modified the exporter to include lang parameter. Other than that the code is the same as yours.

About the UTC, you are right, I am not sure why, but the scrips saves the code in UTC + 1, I thought it was normal since that is the timezone I am in.

JaimeBadiola commented 5 years ago

I just tried again with python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-20 --until 2018-02-21 and the same happend.

The first tweet is

2018-02-20 23:59:53,jmauli,,0,0,"The Bitcoin is dropping going to enjoy short this down to XXXXX",,,,966100124899278848,https://twitter.com/jmauli/status/966100124899278848

the last tweet is

2018-02-20 07:11:38,LibertarianBee,,0,0,"@CoinWarz is not taking in consideration the TX fees that the miners are also receiving. #BCH has less TX than #BTC #Bitcoin.",,@CoinWarz,#BCH #BTC #Bitcoin,965846390088785921,https://twitter.com/LibertarianBee/status/965846390088785921

It downloaded 48757 tweets

Mottl commented 5 years ago

I've tried several times and reproduced your error. I will look deeper somewhat later. Keep in mind that Twitter has per IP limitations. You could be banned for several days if you invoke too many requests.

Mottl commented 5 years ago

Twitter gave me the total number of tweets within this period of 49877.

JaimeBadiola commented 5 years ago

Ok Thanks a lot!

It is a pitty that you can not summit request by hour, since that would solve the issue.

Maybe we can set in the script that after an x number of tweets, the script will have to sleep for 3mn so the requests looks more natural.

Mottl commented 5 years ago

It is a pitty that you can not summit request by hour, since that would solve the issue.

I've tried it a couple of days earlier. Seems like they removed HH:MM:SS specification from since: and until:. You can try it yourself — may be they reverted back that functionality.

JaimeBadiola commented 5 years ago

Yeah I tried yesterday aswell, and it doesnt work.

Mottl commented 5 years ago

Btw, I've added --lang parameter: https://github.com/Mottl/GetOldTweets3/commit/dd0924b03991571ab25a66deaba7e89545ea61dc

JaimeBadiola commented 5 years ago

Thanks a lot that is pretty good!

Send me a message if you find the solution to the issue. I will try to test some options as well.

JaimeBadiola commented 5 years ago

Hello,

I am leaving you a list of querys that stops during the download consistenly around the same downloads.

python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-18 --until 2018-02-19 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-19 --until 2018-02-20 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-24 --until 2018-02-25 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-25 --until 2018-02-26

It is very weird because there are some other days, with more number of tweets, that have no problem.

Some of these queries stop after 2 or 4 thousand tweets (Not a big number)

Just in case this helps solve and see the issue

JaimeBadiola commented 5 years ago

Hello!

To add to the list of queries with issues,

python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-02 --until 2018-03-03 | Number of tweets downloaded: 1186 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-03 --until 2018-03-04 | Number of tweets downloaded: 3747 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-04 --until 2018-03-05 | Number of tweets downloaded: 977 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-06 --until 2018-03-07 | Number of tweets downloaded: 24879 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-07 --until 2018-03-08 | Number of tweets downloaded: 19826 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-20 --until 2018-03-21 | Number of tweets downloaded: 29030 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-26 --until 2018-03-27 | Number of tweets downloaded: 1595 python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-29 --until 2018-03-30 | Number of tweets downloaded: 20469

The first query is relatively small so I runned the Debug option, here is the log debug.log

I think this has something to do with the message of the tweet, or something that the script downloads, and makes it stop. The downloads fails consistently in with this queries at the same point, while runing other queries of full days (55000 messages) there is no issue. So I do not think Twitter is blocking the request, I think the program reads or scraps something that makes it think he is finished with the query.

Mottl commented 5 years ago

Ok, thanks

The problem is this: each response has the min_position value (an id) that should be used to access further tweets using max_position GET parameter. The problem occurs when for some reason the min_position returned by Twitter is invalid (in your log file the invalid min_position is cm+55m-JFJbaXvEDXJsaJEXEa-JFJbEvDDvsbIbXIabF). To fix this issue you can do the following:

  1. Check if query has not returned any tweets and you expect that.
  2. Make a query that was before your previous query and get min_position value from JSON response again. Look for duplicate tweets already saved and save the new tweets if any.
  3. If the new min_position is the same as was given at the point 1. — break loop, since everything ok. Else continue the loop with the new min_position value given at the point 2.

Feel free to make a pull request if you fix this issue. Thanks!

JaimeBadiola commented 5 years ago

I am going to try! however, it seems complicated (I only started learning python 4 months ago...)

rahulha commented 5 years ago

Hi Jaime.

1) Twitter do not have any limitations on how much you can scrape, and Twitter do not track based on IP. On Orgneat I was downloading millions of tweets everyday and Twitter never blocked me. Based on Twitter's policy, you can always scrape the search query result without any issue. It is provided in https://twitter.com/robots.txt

2) The issue here is Min_position and Has_more_items flags. Twitter's legacy timeline caching system Haplo has its limitations. So when you start downloading millions of tweets, it runs out of memory and sometimes returns has_more_items as false. You can read about how twitter cache works in here

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

3) Recently Twitter changed the min_position attribute from showing exact position to a hash based position that starts with cm+. Previously it used to come as TWEET-number-number. If you look closely, the structure of min_position is simple. Its the first and last Tweet ID concatenated with TWEET word. Check if you receive min_position starting with cm+ then just create your own min_position. Dont need to go back and get the last successful min_position.

4) Finally, based on my experience of over 10 billion tweets download, I can say that Twitter is not perfect as any other software system. It would crash and wont respond sometimes. So there is one more thing you need to do, change the until date and restart scraping.

If I summarize the logic this is what it would look like

i) If has_more_items is false, check if Date of last tweet received is same as Since date. ii) If it is same, your query is complete. If it is not, assume Twitter did not respond. Request Twitter again with same query for 5 or so times. Twitter sometimes do respond. iii) If twitter do not respond for 5 times, Set query's until date = Date of Last Tweet and request again. iv) if min_position starts with cm+ set min_position = "TWEET-" + Tweet ID of Last tweet in result + "-" + tweet ID of first tweet in result.

@JaimeBadiola you already know my .Net code which have all of these condition checks. My old Python version code is attached here. I hope this helps.

TC-ProdVersion-Full.txt

aduriseti commented 5 years ago

Are you guys sure twitter doesn't block IPs? On my remote machine after a certain point (maybe 1mil tweets in 1000 requests) all my responses come back as zero length (empty).

But I can still query from my laptop.

I'll be honest - I didn't full understand the discussion on min_position - but I don't see how this could be the source of my problem.

rahulha commented 5 years ago

Please help understand, what do you mean by 1 million tweets in 1000 requests? Are you talking about Twitter API? Coz there is no requests concept in scraping. Also while scraping every Twitter URL call for json download returns 20 tweets max. If you are using APIs then min_position do not apply to you.

About Twitter policies, check this out

image

https://twitter.com/en/tos

There is a gray area when it comes to distinguishing between Scraping and Crawling although both might look same they are different. But it depends on how Twitter defines it. In TOS page there is nothing related to blocking of IP address. Second, blocking of IP means detecting IP address which is against Twitter's privacy policy. Third, IP blocking will not work if you are behind DNS when IP is refreshed periodically or Public network so basically IP blocking is not a good solution and companies knows it.

When I was downloading tweets as part of my free assistance on orgneat.com I never had issue with blocking my scraper. To understand this, first we need to understand how Twitter scraper program works. The program mimics a browser and it simply scrolling down the web page in order to get the statuses (tweets). If Twitter blocks the program Twitter has to block all requests coming from your network/system which basically means if you open Twitter.com you should not be able to see anything.

While I was working on scraping requests from all around the world, it helped me a lot with little research on how it works. It make sense that if you start downloading millions of tweets, depending on various factors like Internet connection, glitches, Twitter's handling of requests etc there might be some issues. Please note that a scraper is extremely fast human scrolling down a page constantly possibly every second. I do face same issues now and then, so I have made conditional checks, the logic I explained above. I did this only because I was trying to assist many people with free service and wanted to provide a seamless request/delivery experience with scraper running day-night unattended.

kho7 commented 5 years ago

Thanks for addressing the questions. I found the --username query can get full data (e.g. --since 2018-01-01 until 2-18-12-22) but --querysearch (keyword: China tariff) did the best up to 9/13/2018. In that request, I downloaded 142,396 tweets. I did try multiple combinations but still was unable to reach beyond 9/13/2018). Is that memory related? Or IP address related? I did try manually scroll the page that allows me to reach further. Any suggestions will be greatly appreciated!!

Mottl commented 5 years ago

@kho7, the issue is with min_position as stated by Rahul: https://github.com/Mottl/GetOldTweets3/issues/3#issuecomment-439092370

kho7 commented 5 years ago

Thanks for your reply and I learnt a great deal. I use the command line method

GetOldTweets3 --querysearch "China tariff" --since 2018-01-01 --until 2018-9-13 --output "tradewar02g.csv"

Shall I modify the TweetManager.py to change min_position? Thanks again big time.

Mottl commented 5 years ago

You sure understand what Rahul has written about min_position?

JaimeBadiola commented 5 years ago

I am testing this query that only recovers 24 tweets.

"python Exporter.py --lang en --querysearch "bitcoin" --since 2017-08-13 --until 2017-08-14"

And the issue seems to be that the while loop stops here (Line 67 of TweetManager.py)

if len(json['items_html'].strip()) == 0: break

I tried to Get Json response 10 times before breaking the while loop but twitter doesnt answer accordingly.

Any ideas?

kho7 commented 5 years ago

You sure understand what Rahul has written about min_position?

One part I am trying to understand and apply is

iv) if min_position starts with cm+ set min_position = "TWEET-" + Tweet ID of Last tweet in result + "-" + tweet ID of first tweet in result.

Thanks again.

giulionf commented 5 years ago

I'm having the same issue, but for some reason the result length is 0 all the time! When I'm using another python script the same search is working without problems... image

JaimeBadiola commented 5 years ago

What is the other python scrypt @giulionf ?

giulionf commented 5 years ago

Basicly, a test script to check if it was working or not... I just set the parameters my other script was fetching manually. On my remote, it's working as well! Really strange...

import GetOldTweets3 as got3

def test():
    criteria = got3.manager.TweetCriteria().setUsername("@desusnIce").setMaxTweets(10).setUntil('2014-01-03').setSince('2013-12-31').setWithin('').setQuerySearch('Desus OR Nice OR Follow OR desusnlce OR wonder OR if OR jay OR ever OR sneaks OR off OR at OR night OR to OR sell OR crack OR just OR to OR see OR if OR he OR still OR got OR it OR PM OR Jan OR Retweets OR Likes')
    tweets = got3.manager.TweetManager.getTweets(criteria)
    for tweet in tweets:
        print(tweet.text)

if __name__ == '__main__':
    test()
JohnDickson5 commented 5 years ago

I seemingly triggered this error querying one tweet at a time, <100 times a day, over the course of 2 weeks. For what I gather, that's much less volume, more spread out than what others have reported here.

I'm not grasping most of what is posted here. Will conducting test queries exacerbate the problem? Will switching networks or using a VPN will not help?

gghidiu commented 4 years ago

I have similar issue. For multiple queries the download stops at a certain number without any errors. Sometime the number varies, but it always stops before reaching the until date. @JaimeBadiola have you managed to correct this bug. I am new to Python, so it would be very helpful if you could post the modified code here.

Thanks in advance!

JaimeBadiola commented 4 years ago

No, I wasn't able to correct the bug.

On Tue, 3 Sep 2019 at 14:43, gghidiu notifications@github.com wrote:

I have similar issue. For multiple queries the download stops at a certain number without any errors. Sometime the number varies, but it always stops before reaching the until date. @JaimeBadiola https://github.com/JaimeBadiola have you managed to correct this bug. I am new to Python, so it would be very helpful if you could post the modified code here.

Thanks in advance!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mottl/GetOldTweets3/issues/3?email_source=notifications&email_token=AJSZZCJQJBKXBGAOUL6K2KTQHZSWXA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YHNZQ#issuecomment-527464166, or mute the thread https://github.com/notifications/unsubscribe-auth/AJSZZCPM76XJNTB52ODOB6TQHZSWXANCNFSM4GCBI7EA .

gghidiu commented 4 years ago

@JaimeBadiola , have you found a working alternative then?

JaimeBadiola commented 4 years ago

What I did was to download all the data day by day and if one day there was an error I would mark that day as missing data. In total i downloaded about 500 days and a bit more than 20 were missing.

On Tue, 3 Sep 2019 at 15:14, gghidiu notifications@github.com wrote:

@JaimeBadiola https://github.com/JaimeBadiola , have you found a working alternative then?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mottl/GetOldTweets3/issues/3?email_source=notifications&email_token=AJSZZCLXWS6NUJ6KZUGDRT3QHZWNVA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YKXMI#issuecomment-527477681, or mute the thread https://github.com/notifications/unsubscribe-auth/AJSZZCOTTTN7E4FI4QGODZDQHZWNVANCNFSM4GCBI7EA .

aduriseti commented 4 years ago

Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority.

gghidiu commented 4 years ago

It is a pitty that you can not summit request by hour, since that would solve the issue.

I've tried it a couple of days earlier. Seems like they removed HH:MM:SS specification from since: and until:. You can try it yourself — may be they reverted back that functionality.

There is a workaround to download the tweets from a specific time. It partially solves the problem since you can just continue downloading from where the program has stopped.

The idea is to convert the --until time to the tweet id and insert the tweet id as max_id into the query. The formula for this is: (millisecond_epoch - 1288834974657) << 22 = tweet id.

For example, if the download stoped at 2016-08-24 19:38:13

In my case the final query will look something like this:

GetOldTweets3 --querysearch 'bitcoin max_id:768532884616118272' --lang de --since 2016-08-24 --maxtweets 300000 --output 'bitcoin_24_08_2016.csv'

since we have the max_id parameter the --until becomes redundant.

gghidiu commented 4 years ago

@aduriseti , would be great if we get the working version.

aduriseti commented 4 years ago
  • 768532884616118272 Wow - had no idea we could calculate tweet ids like this - wouldve vastly simplified my project had I known. Thanks for the tip.
modatamoproblems commented 4 years ago

Did you try running the same script multiple times in hopes of getting more tweets? I noticed I was getting fewer and fewer tweets loaded when I did this... checked my task manager and CPU was through the roof. There were also 20+ Pythons listed... I restarted and broke the date ranges into smaller ranges, create a data frame, and then save them as CSVs. Seems to have done the trick (aside from the pulls still taking a very very long time). I still have to reboot between pulls

If anyone has Alteryx, there is a Twitter app you can use to pull the data.

rodrigoborgesmachado commented 4 years ago

I'm trying to make this search: tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15")

But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me?

JaimeBadiola commented 4 years ago

My work around was to jump the days that I had issues with. So jump 2018-12-20 and keep downloading after that

On Fri, 15 Nov 2019 at 19:07, Rodrigo Borges Machado < notifications@github.com> wrote:

I'm trying to make this search: tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15")

But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mottl/GetOldTweets3/issues/3?email_source=notifications&email_token=AJSZZCI226HFD7CWIVTVBMDQT3XPNA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGNJCA#issuecomment-554488968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJSZZCOHIBXMZ5UGTUGVRLDQT3XPNANCNFSM4GCBI7EA .

rodrigoborgesmachado commented 4 years ago

My work around was to jump the days that I had issues with. So jump 2018-12-20 and keep downloading after that On Fri, 15 Nov 2019 at 19:07, Rodrigo Borges Machado < @.***> wrote: I'm trying to make this search: tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15") But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AJSZZCI226HFD7CWIVTVBMDQT3XPNA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGNJCA#issuecomment-554488968>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJSZZCOHIBXMZ5UGTUGVRLDQT3XPNANCNFSM4GCBI7EA .

That was my first ideia, I will try that way so...

chtryanil commented 4 years ago

Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority.

could you please do this.

kho7 commented 4 years ago

I have not used Scala but I will be happy to try it. Thanks.

On Apr 23, 2020, at 9:19 PM, chtryanil notifications@github.com wrote:

Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority.

could you please do this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Mottl/GetOldTweets3/issues/3#issuecomment-618764715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFMPUSOWYPSMYDYIRDRHETRODZMZANCNFSM4GCBI7EA.

joshkwannacode commented 4 years ago

Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id?

import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\
                                                           .setUntil("2020-05-01")\
                                                           .setNear('Detroit,Michigan')\
                                                           .setSince("2020-04-03")\
                                                           .setMaxTweets(100)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.username)
modatamoproblems commented 4 years ago

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine

On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote:



Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id?

import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username)

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Mottl/GetOldTweets3/issues/3#issuecomment-639406563, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

joshkwannacode commented 4 years ago

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote:  Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

vonadz commented 4 years ago

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote:  Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

That's because you're trying to access a nonexistent attribute of a list of tweets. You should probably learn what a list is. https://www.w3schools.com/python/python_lists.asp

joshkwannacode commented 4 years ago

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote:  Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

That's because you're trying to access a nonexistent attribute of a list of tweets. You should probably learn what a list is. https://www.w3schools.com/python/python_lists.asp

deleted my other post, thanks man i guess i didnt understand lists lol.

edit: made a loop and it works