Benjamin-Loison / YouTube-operational-API

YouTube operational API works when YouTube Data API v3 fails.
401 stars 52 forks source link

Sometimes seem to have YouTube UI or YouTube Data API v3 return empty responses #261

Open Benjamin-Loison opened 8 months ago

Benjamin-Loison commented 8 months ago

And even YouTube operational API instances home webpage?

diff --git a/videos.php b/videos.php
index d0ce18f..5123ebb 100644
--- a/videos.php
+++ b/videos.php
@@ -423,7 +423,15 @@
         }

         if ($options['snippet']) {
-            $json = getJSONFromHTMLForcingLanguage("https://www.youtube.com/watch?v=$id");
+            $opts = [
+                'http' => [
+                'header' => ['Accept-Language: en']
+                ]
+            ];
+            $html = getRemote("https://www.youtube.com/watch?v=$id", $opts);
+            $jsonStr = getJSONStringFromHTML($html);
+            $json = json_decode($jsonStr, true);
+            //$json = getJSONFromHTMLForcingLanguage("https://www.youtube.com/watch?v=$id");
             $contents = $json['contents']['twoColumnWatchNextResults']['results']['results']['contents'];
             // Note that `publishedAt` has a day only precision.
             $publishedAt = strtotime($contents[0]['videoPrimaryInfoRenderer']['dateText']['simpleText']);
@@ -432,6 +440,10 @@
                 'publishedAt' => $publishedAt,
                 'description' => $description
             ];
+            if (isset($_GET['monitoring'])) {
+                $item['monitoring'] = $html;
+            }
+
             $item['snippet'] = $snippet;
         }

Related to #11.

Benjamin-Loison commented 8 months ago

log.json

load_log_string.py:

import json

with open('log.json') as f:
    print(json.load(f))
python3 load_log_string.py > log_string.txt
getJSONPathFromKey log_string.txt '' ytInitialPlayerResponse | grep -- '--string'
 30 /videoDetails/shortDescription --string
 61 /microformat/playerMicroformatRenderer/description/simpleText --string

videos.html.zip

curl 'https://www.youtube.com/watch?v=mWdFMNQBcjs' > video.html
getJSONPathFromKey video.html | grep -- 'string'
119 /contents/twoColumnWatchNextResults/results/results/contents/1/videoSecondaryInfoRenderer/attributedDescription/content --string
184 /engagementPanels/2/engagementPanelSectionListRenderer/content/structuredDescriptionContentRenderer/items/1/expandableVideoDescriptionBodyRenderer/attributedDescriptionBodyText/content --string

There are other occurrences as HTML tags:

grep -o -- '.\{0,50\}--string.\{0,50\}' log.json
blic video\"><meta name=\"description\" content=\"--string0:00 Beginning0:06 Middle0:12 End\"><meta name=\"k
720\"><meta property=\"og:description\" content=\"--string0:00 Beginning0:06 Middle0:12 End\"><meta property
eo\"><meta name=\"twitter:description\" content=\"--string0:00 Beginning0:06 Middle0:12 End\"><meta name=\"t
 video\"><meta itemprop=\"description\" content=\"--string0:00 Beginning0:06 Middle0:12 End\"><meta itemprop
",\"isOwnerViewing\":false,\"shortDescription\":\"--string\\n\\n0:00 Beginning\\n0:06 Middle\\n0:12 End\",\"
public video\"},\"description\":{\"simpleText\":\"--string\\n\\n0:00 Beginning\\n0:06 Middle\\n0:12 End\"},\
Benjamin-Loison commented 8 months ago

Should check if all other fields retrievable with YouTube operational API have HTML tags. Note that:

grep -o -- '.\{0,50\}--string.\{0,50\}' videos.html
"720"><meta property="og:description" content="--string0:00 Beginning0:06 Middle0:12 End"><meta property=
 video"><meta name="twitter:description" content="--string0:00 Beginning0:06 Middle0:12 End"><meta name="twi
blic video"><meta itemprop="description" content="--string0:00 Beginning0:06 Middle0:12 End"><meta itemprop=
eB3zQ","isOwnerViewing":false,"shortDescription":"--string\n\n0:00 Beginning\n0:06 Middle\n0:12 End","isCraw
t":"A public video"},"description":{"simpleText":"--string\n\n0:00 Beginning\n0:06 Middle\n0:12 End"},"lengt
ITY_HIDDEN"}},"attributedDescription":{"content":"--string\n\n0:00 Beginning\n0:06 Middle\n0:12 End","comman
ins"},"attributedDescriptionBodyText":{"content":"--string\n\n0:00 Beginning\n0:06 Middle\n0:12 End","comman