HoloArchivists / twspace-dl

A python module to download twitter spaces.
GNU General Public License v2.0
473 stars 78 forks source link

Rate limit on `guest_token` API #82

Closed mikelei8291 closed 1 year ago

mikelei8291 commented 1 year ago

Describe the bug Since the shutdown of the old API and the switch to the new API in #80, the program would easily hit the rate limit of the API which retrieves the guest_token in long-running monitor sessions.

It seems when visiting to the site without login, the guest_token will be stored as a cookie and reused in future requests to other APIs, so the https://api.twitter.com/1.1/guest/activate.json API would only be requested once at the very beginning.

There are two solutions:

  1. Save the token somewhere after running the program for the first time and reuse the token.
  2. Request the user to provide the cookies so another API for users who already logged in can be used to retrieve the user ID from the user's screen name.

For 1, we need to figure out where to store the token, and for 2, the cookies from a logged in user would be a requirement to run the program, adding restrictions to users.

To Reproduce Run the program multiple times in a short timespan.

Expected behavior Program exit normally.

Output

2023-01-25 14:55:16,235 [DEBUG] Starting new HTTPS connection (1): api.twitter.com:443
2023-01-25 14:55:16,414 [DEBUG] https://api.twitter.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 429 69
Traceback (most recent call last):
  File "/usr/local/bin/twspace_dl", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/twspace_dl/__main__.py", line 232, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/twspace_dl/__main__.py", line 86, in space
    twspace = Twspace.from_user_avatar(args.user_url, auth_token)
  File "/usr/local/lib/python3.10/dist-packages/twspace_dl/twspace.py", line 242, in from_user_avatar
    user_id = twitter.user_id(user_url)
  File "/usr/local/lib/python3.10/dist-packages/twspace_dl/twitter.py", line 49, in user_id
    headers=AUTH_HEADER | {"x-guest-token": guest_token()},
  File "/usr/local/lib/python3.10/dist-packages/twspace_dl/twitter.py", line 21, in guest_token
    token = response["guest_token"]
KeyError: 'guest_token'

Desktop (please complete the following information):

Ryu1845 commented 1 year ago

If your use case is monitoring one or multiple accounts, I'd like to point you to https://github.com/HitomaruKonpaku/twspace-crawler which does that job a lot more elegantly. twspace-dl was created before it, that's why there was preliminary support for monitoring.

That said I'd be glad to accept pull request to better support either 1 or 2.

mikelei8291 commented 1 year ago

Thanks for the suggestion. I've checked out this project before but didn't want to use node.js on my server so I settled with twspace-dl and a custom script I wrote to do the job.

That being said, I think technically I could implement both. The question is where should we store this guest_token so that it won't be an annoyance to the user and is still kept somewhere in the system.

Ryu1845 commented 1 year ago

At the root, of the module in a file called maybe .guest_token