HOIHOIHOII / EpsilonYouTubeScanner

Captures and stores metadata and timeseries for YouTube videos
0 stars 0 forks source link

Channels with very, very high video upload rates cause hang on attempt to add to database #1

Open HOIHOIHOII opened 6 years ago

HOIHOIHOII commented 6 years ago

Currently video ids are harvested from a channel by first partitioning that channel's upload history into intervals of time, in order to create a YouTube API query that returns less than 50 videos (i.e. 1 page of results). This criteria was chosen in order to avoid handling pagination of results,

However, these time intervals have a minimum size of 1 second (Youtube API restriction). There is at least one YouTube channel with a history of uploads where videos were published at a rate greater than 50 per second, or at least, their publishedAt timestamps say so.

I suggest that adding handling of pagination in processing API responses to capture these exceptions.

HOIHOIHOII commented 6 years ago

Here are some YouTube channelIds that warrant further investigation in relation to this. I'm fairly confident ~P9A runs afoul of the bug above.

exceptions = [{"id":"UCQv3dpUXUWvDFQarHrS5P9A", "note":"inf loop"},
              {"id":"UCcmfO29cb4k6oVm5YIe19Rw", "note": "burst error?"},
              {"id":"UCYgL81lc7DOLNhnel1_J6Vg", "note": "burst error?"}]