Open TimBMK opened 2 years ago
I've done some additional testing / troubleshooting. The issue may be connected to the Rate Limits being handled incorrectly by get_user_edges(). I put in an additional 30 minute rest timer every 15 users. Note how the function starts failing (only one datapoint returned for a number of accounts) after the sleep timers and bounces back after the additional 30min rest:
========== Get Followers ==========
Processing 827090742162100224
Total data points: 1000
Total data points: 2000
Total data points: 3000
Total data points: 4000
Total data points: 5000
Total data points: 5223
This is the last page for 827090742162100224 : finishing collection.
Processing 1683845126
Total data points: 6206
This is the last page for 1683845126 : finishing collection.
Processing 1377117206
Total data points: 6307
This is the last page for 1377117206 : finishing collection.
Processing 1073605033
Total data points: 6463
This is the last page for 1073605033 : finishing collection.
Processing 551802475
Total data points: 6464
This is the last page for 551802475 : finishing collection.
Processing 262730721
Total data points: 6512
This is the last page for 262730721 : finishing collection.
Processing 69814084
Total data points: 6812
This is the last page for 69814084 : finishing collection.
Processing 53617577
Total data points: 6969
This is the last page for 53617577 : finishing collection.
Processing 47151012
Total data points: 7044
This is the last page for 47151012 : finishing collection.
Processing 38272947
Total data points: 7328
This is the last page for 38272947 : finishing collection.
Processing 24674518
Rate limit reached. Rate limit will reset at 2022-05-25 12:47:56
Sleeping for 859 seconds.
================================================================================Total data points: 8328
Total data points: 9010
This is the last page for 24674518 : finishing collection.
Processing 19924848
Total data points: 9011
This is the last page for 19924848 : finishing collection.
Processing 910913583168573440
Total data points: 9012
This is the last page for 910913583168573440 : finishing collection.
Processing 347792540
Total data points: 9404
This is the last page for 347792540 : finishing collection.
Processing 308415277
Total data points: 9591
This is the last page for 308415277 : finishing collection.
==== Rest 30 Minutes ====
Processing 127620350
Total data points: 1000
Total data points: 2000
Total data points: 3000
Total data points: 4000
Total data points: 5000
Total data points: 6000
Total data points: 7000
Total data points: 8000
Total data points: 9000
Total data points: 10000
Total data points: 11000
Total data points: 12000
Total data points: 13000
Total data points: 14000
Total data points: 14176
This is the last page for 127620350 : finishing collection.
Processing 46085533
Rate limit reached. Rate limit will reset at 2022-05-25 13:33:18
Sleeping for 850 seconds.
================================================================================Total data points: 15176
Total data points: 15683
This is the last page for 46085533 : finishing collection.
Processing 18933321
Total data points: 16400
This is the last page for 18933321 : finishing collection.
Processing 1340469592666324992
Total data points: 16401
This is the last page for 1340469592666324992 : finishing collection.
Processing 1235523759597068288
Total data points: 16402
This is the last page for 1235523759597068288 : finishing collection.
Processing 1201838123841404929
Total data points: 16403
This is the last page for 1201838123841404929 : finishing collection.
Processing 1187686813168820224
Total data points: 16404
This is the last page for 1187686813168820224 : finishing collection.
Processing 1181598578508214272
Total data points: 16405
This is the last page for 1181598578508214272 : finishing collection.
Processing 1159457070124605442
Total data points: 16406
This is the last page for 1159457070124605442 : finishing collection.
Processing 1159072277746593792
Total data points: 16407
This is the last page for 1159072277746593792 : finishing collection.
Processing 1148916458983964673
Total data points: 16408
This is the last page for 1148916458983964673 : finishing collection.
Processing 1135510177292197888
Total data points: 16409
This is the last page for 1135510177292197888 : finishing collection.
Processing 1132260527378489345
Total data points: 16410
This is the last page for 1132260527378489345 : finishing collection.
Processing 1127961248493129728
Total data points: 16411
This is the last page for 1127961248493129728 : finishing collection.
Processing 1125751445205262336
Total data points: 16412
This is the last page for 1125751445205262336 : finishing collection.
==== Rest 30 Minutes ====
Processing 1113004743733993472
Total data points: 476
This is the last page for 1113004743733993472 : finishing collection.
Processing 1096063189249331201
Total data points: 1476
Total data points: 2476
Total data points: 2561
This is the last page for 1096063189249331201 : finishing collection.
Processing 1095023450790547457
Total data points: 2677
This is the last page for 1095023450790547457 : finishing collection.
Processing 1085599614122762242
Total data points: 2678
This is the last page for 1085599614122762242 : finishing collection.
Processing 1085494381405237251
Total data points: 2822
This is the last page for 1085494381405237251 : finishing collection.
Processing 1083371289786634240
Total data points: 2823
This is the last page for 1083371289786634240 : finishing collection.
Processing 1082925763316367360
Total data points: 2831
This is the last page for 1082925763316367360 : finishing collection.
Processing 1071665423140249600
Total data points: 2869
This is the last page for 1071665423140249600 : finishing collection.
Processing 1070312229139025920
Total data points: 3004
This is the last page for 1070312229139025920 : finishing collection.
Processing 1061873979760263168
Total data points: 3365
This is the last page for 1061873979760263168 : finishing collection.
Processing 1052524172717375488
Total data points: 3554
This is the last page for 1052524172717375488 : finishing collection.
Processing 1042109887604563968
Total data points: 3641
This is the last page for 1042109887604563968 : finishing collection.
Processing 1040160799208161280
Total data points: 4641
Rate limit reached. Rate limit will reset at 2022-05-25 14:18:59
Sleeping for 859 seconds.
I've run some more checks. For one, the rate limit is being hit what seems to be 15 pages, not 15 users. This is rather unclear in both the Twitter API and package Docs. However, the actual problem starts as soon as the rate limit was hit and collection is resumed after the sleep timer. Strangely, though, this is not immediately the case. In my tests, after hitting the rate limit and sleeping, the current ID lookup concludes, the next lookup works fine and only then does it start silently returning empty data.
See this log where I added an extra 15 minute grace period after every 5 accounts:
==== Rest 15 Minutes ====
Processing 127620350
Total data points: 1000
Total data points: 2000
Total data points: 3000
Total data points: 4000
Total data points: 5000
Total data points: 6000
Total data points: 7000
Total data points: 8000
Total data points: 9000
Total data points: 10000
Total data points: 11000
Total data points: 12000
Total data points: 13000
Total data points: 14000
Total data points: 14167
This is the last page for 127620350 : finishing collection.
Processing 46085533
Rate limit reached. Rate limit will reset at 2022-06-03 12:21:38
Sleeping for 849 seconds.
================================================================================Total data points: 15167
Total data points: 15652
This is the last page for 46085533 : finishing collection.
Processing 18933321
Total data points: 16358
This is the last page for 18933321 : finishing collection.
Processing 1340469592666324992
Total data points: 16359
This is the last page for 1340469592666324992 : finishing collection.
Processing 1235523759597068288
Total data points: 16360
This is the last page for 1235523759597068288 : finishing collection.
I am, however, at quite a loss what may cause this behaviour. Maybe the function does not track the right rate limit? That is, does .check_reset() need to distinguish between different rate limits or does it always fetch the correct limit to pass on to .trigger_sleep()?
After additional tests, I can confirm that data returns remain unreliable even if data is returned. In many cases, you do not get all followers of a given user ID. I'm using a workaround now, confirming the number of followers pulled witht the number stated in get_user_profile and re-running if necessary. Doing this I noticed that the data returned seems relatively reliable if calls are wrapped in a for-loop, i.e. when checking every ID with a single call rather than in batches. Using a for-loop to loop through the vector of user IDs might therefore be a valid solution for this issue.
Until it is resolved, however, I would recommend putting these functions in hiatus or at least put a warnig label on them, as results can be grossly misleading.
Thank you for this, @TimBMK and apologies for slow comms. I've been away on holiday. I'll add this to TODOs for next release
Please confirm the following
something went wrong. Status code: 400.
Describe the bug
Both functions get_user_following() and get_user_followers() at some point start returning less datapoints (i.e. follower/followee) for users where there should be considerably more. Eventually, this comes down to 1 datapoint. This, however, is not an actual datapoint: it is a row with all NULL/NA values except the from_id. That is, eventually, no actual data is returned from the endpoint, but the function states it as a returned datapoint (which makes the bug hard to notice). This seems to only start occuring after the rate limit has been hit more than once. This may suggest a problem with the API side. However, I did not find any statements on rate limits other than the 15 lookups per 15 minutes stated here.
Expected Behavior
When looking up larger numbers of users' followers/followees through the respective functions, I would expect them to consistently return the correct data. If additional rate limits need to be adherred, I would expect the function to do this in line with the "sleep" behaviour already implemented. If this is not possible, I would expect the function to throw an error, rather than silently returning no data.
Steps To Reproduce
For me, the problem started occuring about halfway through the second chunk (after the first rest). Notice how user 18933321 returns only 111 instead of its actual 1.076 users. Afterwards, we get only one (empty) datapoint per user. This behaviour might vary per use, but I can trace the exact same pattern in another log. Here's the log for the above example:
The exact same issue occurs with get_user_followers():
Environment
Anything else?
This seems to be the same issue as in #187. However, pinning the problem down to the second half of the second batch might be helpful in tracking the problem down. Let me know if there's anything I can do to help solve the issue / stress test more. Slightly lost as to what may cause that problem atm. Especially since the functions fails completely after reaching the first rate limit (rather than only failing for a number of requests until the rate limit recovers). Furthermore, the fact that it starts returning less data before failing completely suggest there may be an additional rate limit at play, limiting the returns rather than only the requests?