Benjamin-Loison / YouTube-operational-API

YouTube operational API works when YouTube Data API v3 fails.
378 stars 46 forks source link

Optimize performance #69

Open Benjamin-Loison opened 1 year ago

Benjamin-Loison commented 1 year ago

See YouTube Data API v3 optimizing performance documentation.

Are adding compression making sense, as it is included in apache and curl by default, isn't it?

May think about using compressed parameter to decrease server workload, but I don't think that it is worth it.

Related to #27 and #35.

Benjamin-Loison commented 1 year ago

Firsly knowing how to do without removing -H 'Accept-Encoding: gzip, deflate, br' from a cURL request and why gunzip doesn't work sometimes (when?). If only provide gzip as Accept-Encoding, it always correctly return data compatible with gunzip. It's a bit weird (maybe due to their relative overload) from YouTube API to not have a prefered compression method.

Accept-Encoding documentation.

curl -v 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails,status&playlistId=UUAcAnMF0OrCtUep3Y4M-ZPw&maxResults=50&key=AIzaSy...'

Having 166,527 bytes of content according to ls -l.

If add: -H 'Accept-Encoding: gzip, deflate, br' > a && gunzip -c a:

Total packets Length according to Wireshark for the Google API instance IP: 46640

Otherwise if add: > a && cat a:

Total packets Length according to Wireshark for the Google API instance IP: 187614 Total packets Length according to Wireshark for the Google API instance IP: 187713 Executed twice to verify the order of magnitude.

The question is do the API file_get_contents use compression? What are CPU overload of my instances to verify that it wouldn't be an unacceptable CPU overload.

I added to each crontab of official instances:

* * * * * (date && cat /proc/loadavg && cat /proc/meminfo | head -n 3) > health.txt
Benjamin-Loison commented 1 year ago

Could also investigate HTTP Range header.

Benjamin-Loison commented 1 year ago

Could also propose a maxResults and fields parameter, as requested on Discord maxResults and fields. Here is another Discord user expecting maxResults to work.

Benjamin-Loison commented 1 year ago

Note that concerning channels?part=community it returns sometimes empty pages when using nextPageToken, as the YouTube UI, however according to amatis on Matrix it may happen with no more data after so we could try to find an optimization fix to avoid making a few empty requests at the end.

Benjamin-Loison commented 10 months ago

Increase priority following this Discord message. Depending on the endpoint you are using there are maybe alternative webpages less bandwidth consuming to retrieve.

By the way membership: true I adapted it to my site and it was much faster than yours. If you want it to be faster, instead of this URL:

if ($options['membership']) {
    $result = getJSONFromHTML("[https://www.youtube.com/channel/$id"](https://www.youtube.com/channel/$id%22));

use this URL:

if ($options['membership']) {
   $result = getJSONFromHTML("[https://www.youtube.com/channel/$id/search"](https://www.youtube.com/channel/$id/search%22));

Because your URL: 880kb. My URL: 400kb. It can pull and query faster. 40% speed difference.

Source: private Discord message from (788496476187263026)

Benjamin-Loison commented 10 months ago

To avoid making a first request to have a continuation token, being able to reverse-engineer this continuation token would improve performances, cf #190. This would possibly make a single case instead of a first HTML web-scraping and then JSON continuation.

Benjamin-Loison commented 10 months ago

Currently tests are parsed during production delivery...

Benjamin-Loison commented 6 months ago

Like in #258 can use browse YouTube UI endpoint to only retrieve JSON and not HTML containing JSON.

Benjamin-Loison commented 5 months ago

Someone told me on Discord to only receive JSON thanks to YouTube UI browse endpoint, this seems related to #252.