Closed T-prog3 closed 2 years ago
Today and right at the moment it's working (here). Did you already download a bit when you got the error? Then it would be a real "limit exceeded". Or your current IP may be blocked for some reason. Or they are rolling out a new change which you already see and other regions will see it soon.
The settings affect the crawlers for both. Also the default settings are in absolute terms a bit too high for Twitter's API limits, but they work for normal crawling/downloading because of time spent between requests. Obviously there are no separate settings yet, maybe needed in the future.
There is room for improvements. Contributions are welcome.
I actually have been trying to update one user who have already been downloaded once two weeks ago. The error is happening the first minute of running during Evaluated N tumblr posts out of N total posts.
It doesn't download anything new and then i get Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
Then i get the message waiting until date/time
but at that time it only push the date/time forward and doesn't make any progress even after 1 hour. So it appears to be no way to make a complete update of already downloaded users (as of today). My Last Complete Crawl
will continue be stuck at 2022-01-20.
Please open this blog in the browser and tell me when the first two posts have been posted.
Do you have "force rescan" enabled in this blog's settings?
What is the value of LastId
in this blog's index file?
Last Completed Crawl
and the user have a total of 8,796 Tweets.force rescan
is not enabled. However i still think that the software acts in such a way as if this setting was enabled. It's always Evaluated 3500 tumblr posts out of 8,796 total posts
when `Limit exceeded
.At the moment I don't have a clue why it's crawling that much on this blog. Do you have a value inside blog's "download pages" setting?
No, i have almost everything on default settings. The only things i have changed in the software is
General:
Active portable mode
Enabled
Connection:
Concurrent connections 1
Concurrent video connections 1
Limit Tumblr API connections: Number of connections 30
Limit Tumblr SVC connections: Number of connections 30
Blog:
Download reblogged posts
Disabled
Image size (category)
Large
Video size (category)
Large
It seems some error occurs during the crawl process that keeps it from updating LastId
to the newest post.
You could have a look into the TumblThree.log
file, whether you see a hint/error there.
This is the error in TumblThree.log
You should lower the connections to the tumblr api in the Settings->Connection pane., System.Net.WebException: The remote server returned an error: (429) Too Many Requests. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory
1.FromAsyncCoreLogic(IAsyncResult iar, Func
2 endFunction, Action1 endAction, Task
1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Extensions.TaskTimeoutExtension.d0`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Services.WebRequestFactory. d 12.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Services\WebRequestFactory.cs:line 129 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.d25.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 257 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler. d 24.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 236 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.d28.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 339 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler. d 30.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 364 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler.d__33.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 456
This blog downloads without problems here. Even if I try to emulate your situation by adapting the settings and blog file accordingly, it downloads the posts until the one from last time and stops. I don't know what could be the difference to your system.
You could backup the blog's download folder and its two blog files. Then you can add the blog again and see, whether the blog works again and download the missing new posts. Later you can close the app and merge in the backed up files and the already downloaded entries in "blog"_files.twitter from the copy to the current one (just all entries, a few duplicates are ok).
Report from start to end:
TumblThree-v2.5.1-x64-Application.zip
TumblThree.exe
Crawl
(3944 video/images + texts.txt)
Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
50864 posts
so nowhere near completion and still 3 other users to go.waiting until date/time
waiting until date/time
Stop
Calculating unique downloads, removing duplicates ...
Crawl
File Already downloaded.... Skipping
(3964 video/images + texts.txt)
(20 video/images)
new files was downloaded the error occurred again.Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
Conclusions:
The twitter part of the software works to a certain limit. But will take forever to get any files beyond the limit. With only 20 new files the second time around it will take days to complete the first user if it ever succeed to the finish line.
All skipped files seems to be counted as a request that adds to the limit counter.
Log:
No TumblThree.log
to be found in the TumblThree-v2.5.1-x64-Application folder.
Ok, but now we are talking about a different thing, isn't it? It's no longer about downloading a few dozen recent posts, but downloading historic posts (resp. complete blogs). Twitter doesn't want more posts than a certain limit to be downloaded. Obviously they changed something. We have to see, whether we can find a solution or not.
The download of the "post lists" counts towards the limit, whether a post's media is downloaded or skipped.
To my understanding:
I see no difference between updating an already downloaded blog and complete new download.
Both have the same amount of Number of posts
in the active users download queue.
In other words, you will never be able to update/download the second blog in the download queue if the first user have a large amount of Number of posts
. The problem is that the software does a request to each and every post the user have no matter if
you do an update or download a new user. So you do not only get the recent 100 posts you haven't downloaded yet. You get the full blog in the queue no matter what.
The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl
in the queue.
Number of posts
in the queue. We have the problem where the user will never complete and because of that a new Date in Last Complete Crawl
will never be updated. Problem summary:
Last Complete Crawl
Date if it never completes, And you then wouldn't see if its updated.First, you experience resp. describe something that I don't see here. Looks like most other users can update their existing blogs too.
The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl in the queue.
That's exactly what we're doing, precisely LastId
(after a successful complete crawl).
In other words, you will never be able to update/download the second blog in the download queue
Not automatically and unattended, yes. You can, for example, remove this blog from the download queue, which stops its crawler and continues with the next one.
Let me summarize what I get (and probably others too):
LastId
).The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.
Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.
If you know how to fix it, you are welcome to do so (or share it).
@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?
@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?
I've never used Twitter with this App before, so my own experience here is a little limited.
That said, what you state here is obviously true:
- Small blogs can be downloaded and updated without problems.
- Any reasonably up-to-date blog can be updated without problems.
- Only big blogs can no longer be downloaded completely and thus updated later. Experienced users could at least update them with a little tweaking (
LastId
).The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.
Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.
The third point is the real issue, as I understand it, and yes, this is a limitation due to how Twitter works.
I can't download any blogs, new or old, few posts or large.
I had this problem several months ago but it's not bothered me since and I didn't change anything other than the routine TumbleThree updates. I catch-up with all my Tumblr blogs once a month and add any newly discovered ones. I'm now following 257 Tumblr blogs (In know, I'm hooked!), and the last catch-up on the first of the month was 147 GB and 404,000 files. It took almost 24 hours to harvest everything, but ran perfectly.
I'm using all default settings.
Describe the bug Still being
Rate Limited
on the twitter API with a suggestion to lower the connections in Setting. This however makes no difference at all. I tried as low as10 Numbers of connections in 60s
with only1 Concurrent connection
. To my understanding of the Twitter APIhttps://developer.twitter.com/en/docs/twitter-api/rate-limits
this shouldn't be an issue?This also raises the questions, if the settings only effect the Tumblr API? Should both Tumblr and Twitter really be treated under the same settings and name?
And shouldn't there also be a way to Authenticate a Twitter account? This would allow you to crawl users that only allow followers.
Desktop (please complete the following information):