Joystream / youtube-synch

YouTube Synchronization
11 stars 11 forks source link

Automatic Opt-out of Channels due to Refresh Token Error #337

Open zeeshanakram3 opened 1 month ago

zeeshanakram3 commented 1 month ago

Context

Previously, we used yt-dlp (https://github.com/yt-dlp/yt-dlp) to download channel info, video lists, video metadata, and actual videos from all channels for syncing purposes. However, a few months ago, YouTube became more restrictive, implementing rate-limiting measures and eventually blocking the IP address of the machine due to excessive usage.

Changes Made

Current Issue

Due to the change to the authenticated YouTube Data API, the following check in the YoutubePollingService.ts automatically opted out more than 10,000 channels: Code Reference

During investigation, the error returned from Google when fetching channel info is:

{ error: 'invalid_grant', error_description: 'Bad Request' }

Google OAuth documentation indicates that refresh tokens may become invalid if there is an inactivity period of more than 6 months (Reference). This could explain the issue, as we switched to using the YouTube API only recently, after relying on yt-dlp for over a year.

Impact

Temporary Fix

For now, the following code has been commented out to prevent further automatic opt-outs: Code Reference

Potential Solutions

  1. Revert the Channel Status:

    • If we continue using the YouTube API for channel info, a re-authorization of the gleev app from users will be required, which may not be feasible at scale.
    • Alternatively, revert the yppStatus field of all the affected 10,000+ channels using the information stored in HubSpot. This would involve:
      • Writing a script to fetch the previous state of all affected channels from HubSpot.
      • Updating the state in DynamoDB accordingly.
  2. Return to yt-dlp for Data Retrieval:

    • Switch back to using yt-dlp for fetching channel info, video lists, etc., but address the IP blockage issue.
    • Potential solutions to IP blockage include rotating proxies or using a pool of IP addresses.

Next Steps

References

bedeho commented 1 month ago

Excellent breakdown, thank you @zeeshanakram3