Previously, we used yt-dlp (https://github.com/yt-dlp/yt-dlp) to download channel info, video lists, video metadata, and actual videos from all channels for syncing purposes. However, a few months ago, YouTube became more restrictive, implementing rate-limiting measures and eventually blocking the IP address of the machine due to excessive usage.
Changes Made
Switched to using the authenticated YouTube Data API (OAuth API) for fetching channel info and video metadata.
Reduced the number of concurrent downloads to be nearly sequential.
Current Issue
Due to the change to the authenticated YouTube Data API, the following check in the YoutubePollingService.ts automatically opted out more than 10,000 channels:
Code Reference
During investigation, the error returned from Google when fetching channel info is:
Google OAuth documentation indicates that refresh tokens may become invalid if there is an inactivity period of more than 6 months (Reference). This could explain the issue, as we switched to using the YouTube API only recently, after relying on yt-dlp for over a year.
Impact
The above-mentioned check automatically opted out over 10,000 channels.
Many refresh tokens may have become invalid due to prolonged inactivity.
We lack previous state information (yppStatus field in channels table) in DynamoDB, but the required data is available in HubSpot (as each field is versioned there).
Temporary Fix
For now, the following code has been commented out to prevent further automatic opt-outs:
Code Reference
Potential Solutions
Revert the Channel Status:
If we continue using the YouTube API for channel info, a re-authorization of the gleev app from users will be required, which may not be feasible at scale.
Alternatively, revert the yppStatus field of all the affected 10,000+ channels using the information stored in HubSpot. This would involve:
Writing a script to fetch the previous state of all affected channels from HubSpot.
Updating the state in DynamoDB accordingly.
Return to yt-dlp for Data Retrieval:
Switch back to using yt-dlp for fetching channel info, video lists, etc., but address the IP blockage issue.
Potential solutions to IP blockage include rotating proxies or using a pool of IP addresses.
Next Steps
Decide on the approach: continue with the YouTube API and address re-authorization or switch back to yt-dlp.
Write a script to fetch previous channel states from HubSpot if we opt to revert statuses.
Plan for mitigating IP blockage if we return to yt-dlp.
Context
Previously, we used
yt-dlp
(https://github.com/yt-dlp/yt-dlp) to download channel info, video lists, video metadata, and actual videos from all channels for syncing purposes. However, a few months ago, YouTube became more restrictive, implementing rate-limiting measures and eventually blocking the IP address of the machine due to excessive usage.Changes Made
Current Issue
Due to the change to the authenticated YouTube Data API, the following check in the
YoutubePollingService.ts
automatically opted out more than 10,000 channels: Code ReferenceDuring investigation, the error returned from Google when fetching channel info is:
Google OAuth documentation indicates that refresh tokens may become invalid if there is an inactivity period of more than 6 months (Reference). This could explain the issue, as we switched to using the YouTube API only recently, after relying on
yt-dlp
for over a year.Impact
yppStatus
field in channels table) in DynamoDB, but the required data is available in HubSpot (as each field is versioned there).Temporary Fix
For now, the following code has been commented out to prevent further automatic opt-outs: Code Reference
Potential Solutions
Revert the Channel Status:
gleev
app from users will be required, which may not be feasible at scale.yppStatus
field of all the affected 10,000+ channels using the information stored in HubSpot. This would involve:Return to
yt-dlp
for Data Retrieval:yt-dlp
for fetching channel info, video lists, etc., but address the IP blockage issue.Next Steps
yt-dlp
.yt-dlp
.References
/home/ubuntu/youtube-synch/local/logs/youtube-sync-2024-09-17.log