Open breezykermo opened 4 years ago
Note: should also make clear what the quota limit is in the selector documentation, and how the selector exhausts quota. Ash's message from Discord:
"I tracked the number of pages I was pulling from, the number of videos I had fetched, and the quota cost I incurred. I had 5 search queries, range was across the past 3 years, set daily to false, set the limit on the number of pages to 50 (though most of the time, there were max 15 pages worth of data) and ended up with about 500-600 videos for each query. The quota usage (queries per day) this incurred was 7446. This is larger than the number of videos and much larger than the number of pages but it didn't hit the 10k limit so that's good. Also, the other change I made was adding a sleep counter between successive queries so as to not breach the other quota limit: queries per 100 seconds per user. Perhaps a future version of mtriage could also have some sort of throttling functionality so users don't hit this limit."
Should also note that we are always subject to Youtube's search algorithm when considering what appears and what doesn't appear. Exhaustive searches using mtriage at scale could be flagged and could modify results.
It's currently unclear exactly how the youtube selector exhausts Google Cloud quotas. It was assumed that using the [Youtube V3 api]() (which the youtube selector does under the hood) would incur a quota usage of 1 unit per search.
When setting the
daily
parameter to false, this should only use 1 quota per but the selector is often hitting the limit (10k queries) within just a few minutes, even when putting a sleep in the code to space them out.This could be due to a number of things- the fact that certain queries require paging via tokens, or perhaps that more metadata is being returned from each search than just the video ID. Needs further investigation and a fix.