how stable do you think the internal API will be?

Karel-Kroeze commented 3 years ago

First off, congratulations on what looks like a promising project. I'm currently working on creating a pipeline for scraping YouTube data for research projects, and since the YouTube API quotas are laughably tiny in that context, this is quickly shaping up to be one of my favourite packages.

I do have some questions...

You describe using an internal API, and I've noticed that you're using a key that presumably was dug up with some clever youtube archaeology. Considering this is far from an officially endorsed product, how stable do you expect the API to be?

Have you experienced any rate-limiting or IP bans in development?
From your experience with yt-scrape, how aggressive do you expect YouTube will be in trying to block this tool?
How successful do you expect to be in keeping the package working?

And as the success of this project could play a large role in the creation of a data pipeline for our research projects;

Is there anything I can do to help with development, testing or documentation?

SuspiciousLookingOwl commented 3 years ago

Have you experienced any rate-limiting or IP bans in development?

Never get IP banned. I'm not sure about rate limit, because the response is a bit inconsistent, for example in 100 request you can get 1 fail request and need to resend the request, haven't dig into what causes this yet, might be the rate limiter, or just my library being buggy (e.g. not parsing the correct thing). I never tried to send a massive amount of requests though (in parallel for example), the most I've ever done is about 150 request / mins continuously for 15-20minutes in scrape-yt, and never received any bans from Youtube, haven't tried the same in youtubei but I expect the same thing.

From your experience with yt-scrape, how aggressive do you expect YouTube will be in trying to block this tool?

I honestly think YouTube doesn't care at all lol, ytsr has been around for almost 3 years and it's quite popular for an API key-less library for getting /scraping data from YouTube, yet it's still doing fine.

How successful do you expect to be in keeping the package working?

It depends on YouTube, if they keep making big breaking changes every few weeks / days then I will probably drop the support for this library, but since last year (the first time I made scrape-yt), I haven't notice any changes yet, and I don't think YouTube will make any big changes anytime soon.

Is there anything I can do to help with development, testing or documentation?

The most helpful thing is probably extending the tests or bug reporting. There are many unthinkable / less common cases that I haven't consider while developing this library, for example some Covid related video have this extra banner above the title: Another example on a Youtube Premium video: Things like this can screw up the parser and throws an error.

Karel-Kroeze commented 3 years ago

thank you for the information!

I had actually noticed some mixed results surrounding covid related topics, as this is one of the topics we're particularly interested in. Over the next few weeks, I'll try and get a better idea of the internal workings of the project, and keep track of any unexpected results and parsing anomalies I encounter.

SuspiciousLookingOwl commented 3 years ago

Sounds good! I will close this issue for now. If you have any other questions, feel free to open a new issue / re-open this issue.

SuspiciousLookingOwl / youtubei

how stable do you think the internal API will be? #2