Benjamin-Loison / YouTube-operational-API

YouTube operational API works when YouTube Data API v3 fails.
394 stars 48 forks source link

Remove `keys.php` to use a tooltip? #18

Open Benjamin-Loison opened 2 years ago

Benjamin-Loison commented 2 years ago

As may add a form in the future to enable people to share their YouTube Data API v3 developer keys, this webpage could be used for this even if a short advertisement for it could be added to index.php. Should proceed to #17 before proceeding to this issue as adding keys may not be necessary with current quota usage.

Benjamin-Loison commented 2 years ago

As last two days the no-key service is using more than the quota of all keys, this issue is prioritized.

Giving tools and tips for searching keys on the web may be useful. All YouTube Data API v3 keys start with AIzaSyA or AIzaSyB or AIzaSyC or AIzaSyD, more details here. Searching for instance "AIzaSyA" YouTube on Google (did "AIzaSy" YouTube, "AIzaSyA" YouTube, "AIzaSyB" YouTube, "AIzaSyC" YouTube and "AIzaSyD" YouTube). Searched AIzaSy, AIzaSyB (haven't done Code for both) and still have to do Issues for AIzaSyD on GitHub (searching AIzaSyA gives other results). Could also use other search engines, Stack Overflow (the keys I have encountered on it were also treated, I also treated AIzaSyA, AIzaSyB, AIzaSyC and AIzaSyD, I also used an algorithm but I don't know if it manages edits on posts as most keys are removed after edits), GitLab (doesn't seem to find anything with AIzaSy*)... https://archive.ph/www.googleapis.com was also exploited. Precising /youtube/v3/ seems to not work. What about Web Archive? https://web.archive.org/web/*/https://www.googleapis.com/youtube/v3/* https://archive.org/developers/

Could show how to contribute with a short video with a Google account.

YouTube_Data_API_v3_key_web_scraper

Benjamin-Loison commented 2 years ago

Such a tool would also be useful for the instance host as it would allow him to cleanly add a YouTube Data API v3, as currently by hand have to pay attention to not screw up keys.txt.

Benjamin-Loison commented 2 years ago

Once done, change this StackOverflow answer to propose my no-key service, but as it is currently running out of quota, I am not advertising it. Done.

Benjamin-Loison commented 2 years ago

Note that a fresh instance will display for the no-key service: Currently this service is powered by 1 keys. and potentially a PHP warning #23 (screenshot). Could display a custom error message if try to use the no-key service while it isn't powered by any YouTube Data API v3 key.

Related to #19.

Benjamin-Loison commented 2 years ago

Could make metrics, such as checkQuotaLogs.txt and checkUnusualLogs.txt, public. Adding metrics for how many quota we consume per day would be interesting too.

Have added for the moment https://yt.lemnoslife.com/metrics/ Note that as the no-key service requires to collect multiple YouTube Data API v3 keys, I assume that sharing some WIP details on it isn't having a high priority.

19 is a bit blocking this.

Benjamin-Loison commented 2 years ago

Can test many keys with this Python script:

`test_youtube_data_api_v3_keys.py`: ```python import requests import json from tqdm import tqdm # Assume keys to be unique. with open('keys.txt') as f: keys = f.read().splitlines() URL = 'https://www.googleapis.com/youtube/v3/channels' params = { 'forHandle': '@MrBeast', 'fields': 'items/id', } workingKeys = [] for key in tqdm(keys): params['key'] = key data = requests.get(URL, params).json() try: if data['items'][0]['id'] == 'UCX6OQ3DkcsbYNE6H8uQQuVA': workingKeys += [key] except KeyError: pass ''' quotaExceeded = 'quota' in content if not 'error' in data or quotaExceeded: print(key, quotaExceeded) ''' print(workingKeys) ```
The equivalent parallel algorithm to be faster: ```python import requests import json from tqdm import tqdm from multiprocessing import Pool # Assume keys to be unique. with open('keys.txt') as f: keys = f.read().splitlines() URL = 'https://www.googleapis.com/youtube/v3/channels' def testYouTubeDataApiV3Key(youTubeDataApiV3Key): params = { 'forHandle': '@MrBeast', 'fields': 'items/id', 'key': youTubeDataApiV3Key, } data = requests.get(URL, params).json() try: if data['items'][0]['id'] == 'UCX6OQ3DkcsbYNE6H8uQQuVA': return youTubeDataApiV3Key except KeyError: return with Pool(10) as p: workingKeys = set(tqdm(p.imap(testYouTubeDataApiV3Key, keys), total = len(keys))) workingKeys.remove(None) print(workingKeys) ```
Benjamin-Loison commented 2 years ago

I am set up a test at 9:01 AM UTC+2 (as at 9:00 AM we aren't running out of quota anymore) to test all YouTube Data API v3 keys that have currently exceeded their quota. If all keys pass this test, then maybe could allow keys having exceeding quota to be added. However someone could fill keys with manually set 0 quota limit...

The tests at 9:01 AM UTC+2 only returned exceeded quota. Will give a try at 10:01 AM UTC+2, otherwise should try every minute and if not passed the test once, then the key is definitely useless. Started the every minute test for all keys at Sat Oct 22 17:34:23 CEST 2022. None of the keys were useful for a single request during 24 hours.

Benjamin-Loison commented 2 years ago

Could advertise the possibility to share a YouTube Data API v3 key when the no-key service is running out of quota. This should be done at this line of code.

Benjamin-Loison commented 2 years ago

Setting up myself a notification system in case a or multiple check fails happen may make sense. Adding to metrics the delta logs since last retrieve (requiring authentication). Could precise the error on False in order not to be notified every time it happens. Or can't just download last part of the file?

I added a notification system for each fail for the moment. However if for some reason, such as not enough disk space, making the system unable to write anymore logs, my check doesn't take into account such an absence of additional logs.

Benjamin-Loison commented 1 year ago

Check Apache 2 logs to see if some people shared their API keys by mistake. Note that gunzip doesn't output anything to stdout and instead decompress and delete the .gz compressed file, if you want the output on stdout without decompressing use -c.

find -name 'yt.lemnoslife.com-ssl--access.log*'
(gunzip -c 'yt.lemnoslife.com-ssl--access.log.*.gz' && cat yt.lemnoslife.com-ssl--access.log{,.1}) | grep AIzaSy | grep -v addKey

It is safe to add non existing files to the command above as there is a warning on stderr which isn't greped and so we got cat: FILE: No such file or directory. As I execute above command everytime I archive the logs, at least filter the already used keys for the no-key service out, would make this process faster. This is the aim of the following algorithm:

searchKeysInLogs.py: ```python #!/usr/bin/python3 import os import subprocess import re def execute(cmd): return subprocess.check_output(cmd, shell=True).decode('utf-8') with open('/var/www/ytPrivate/keys.txt') as f: keys = set(f.read().splitlines()) # Just for making Python interpreter happy. path = '/var/log/apache2/' os.chdir(path) PREFIX = 'yt.lemnoslife.com-ssl--access.log' cmd = f'(cat {PREFIX}.*.gz | gunzip && cat {PREFIX}.1 && cat {PREFIX}) | grep AIzaSy | grep -v addKey' result = execute(cmd) matches = re.findall(r'AIzaSy[A-D][a-zA-Z0-9-_]{32}', result) uniqueMatches = set(matches) keysToAdd = uniqueMatches - keys print(keysToAdd) ```

Found this way 22 keys with quota (no others) by checking latest website logs and checked the same way my old VAIO laptop, my ASUS, my computer (including my 2, 3 and 6 TB hard disks), OC3K and the VPS itself. Maybe haven't checked yt.lemnoslife.com-ssl--access.log everywhere but hey I searched enough.

Benjamin-Loison commented 1 year ago

When adding a new key, make sure to make a backup, as if there isn't any space left on the device, we lose them all. It just happened... Adding a tool to monitor disk space usage would make sense.

https://yt.lemnoslife.com/noKey/videos?part=snippet&id=B-gHb2gPGIs returns for instance:

The request is missing a valid API key.: ```json { "error": { "code": 403, "message": "The request is missing a valid API key.", "errors": [ { "message": "The request is missing a valid API key.", "domain": "global", "reason": "forbidden" } ], "status": "PERMISSION_DENIED" } } ```
Benjamin-Loison commented 1 year ago

Incident temporarily resolved, as brought back a set of keys, but haven't restored yet all keys. As found on my 6 TB hard disk my IP making 60 calls to addKey.php between 20/Oct/2022:23:52:51 +0200 and 21/Oct/2022:00:09:34 +0200, I guess I found the set of keys that was deleted, as I claimed on Discord to have added 29 keys on 21 Oct at 00:50 AM. Note that the last time I modified this post to add information about progress was on Oct 21, 2022, 12:49 AM GMT+2. In addition that after running following algorithm for these calls to addKey.php I added 21 keys (+ 3 manually added due to quota consumption).

import requests

def getURLContent(url):
    return requests.get(url).text

for key in keys:
    print(key)
    url = f'https://yt.lemnoslife.com/addKey.php?key={key}'
    result = getURLContent(url)
    print(result)
Benjamin-Loison commented 1 year ago

Isn't there a way in PHP to keep a variable around across user HTTPS requests? That way we wouldn't read and write a file everytime we switch from a key to the other and so we wouldn't have faced this problem.

Benjamin-Loison commented 1 year ago

Note that the disk space seems mostly used by errors in yt.lemnoslife.com-ssl--error.log which weighs more than 8 times more than yt.lemnoslife.com-ssl--access.log, related to #23.

Example of filled logs (file size decreasing order):

File Size (MB) Lines
yt.lemnoslife.com-ssl--error.log.1 1,500 8,319,186
yt.lemnoslife.com-ssl--access.log.1 131.8 509,956
yt.lemnoslife.com-ssl--error.log.2.gz 86.9 6,513,427
yt.lemnoslife.com-ssl--access.log.2.gz 12.1 398,512

yt.lemnoslife.com-ssl--*.1 were filled from 09/Nov/2022:00:01:05 +0100 to 10/Nov/2022:00:44:31 +0100 (~24 hours). yt.lemnoslife.com-ssl--*.log.2.gz were filled from 08/Nov/2022:00:38:57 +0100 to 09/Nov/2022:00:01:02 +0100 (~24 hours).

Moved from LogLevel debug to LogLevel info ssl:warn in /etc/apache2/sites-available/ssl.yt.lemnoslife.com.conf. See LogLevel documentation. After a service apache2 restart, it seems that there is nothing written to yt.lemnoslife.com-ssl--error.log. I guess that it means that there isn't any error with the many requests that I still see in yt.lemnoslife.com-ssl--access.log.

Have to wait logs to be rotated to download and use fresh empty files to see if my modification was a good change.

Benjamin-Loison commented 1 year ago

From Google account credentials can generate a YouTube Data API v3 key from a random project just by using curl? I think that due to 2FA (by default with Google) etc it isn't worth it.

Benjamin-Loison commented 1 year ago

May think about recoding some of YouTube Data API v3 features by reverse-engineering their YouTube UI, if we aren't able to face the many requests using quota for the no-key service.

Benjamin-Loison commented 1 year ago

Could add an email linked to the key added, if need to contact the key holder for future modification in the policy.

Benjamin-Loison commented 1 year ago

Could use supervariable from a HTTPS request to the other or something like that to avoid reading a file for each request for counting no-key service keys or git commit version used for instance or could at least simplify the file content we really need like:

$keysCountFile = '/var/www/ytPrivate/keysCount.txt';
$keysCount = file_get_contents($keysCountFile);
Benjamin-Loison commented 1 year ago

As described in #48, proceeded at 11:40 PM UTC+1 to logrotate --force /etc/logrotate.d/apache2.

Benjamin-Loison commented 1 year ago

Next time we are really running out of quota advertise with a @everyone on both Matrix and Discord to empower the no-key service.

Benjamin-Loison commented 1 year ago

Should add a mechanism to addKey.php to add the keys on all controlled instances. Maybe just retrieve addKey.php of the other controlled instances from the one that the end-user is interacting with would do the job.

Benjamin-Loison commented 1 year ago

At 20:43 I got:

The YouTube operational API no-key service is detected as not working!

I tested just following this event the no-key endpoint on the three instances and everything was working fine. Logging what's wrong could be interesting in the case that it happens again.

Benjamin-Loison commented 1 year ago

Once will have access to moderator tools privilege on Stack Overflow, could run again above algorithms to search for additional YouTube Data API v3 leaked keys.

Benjamin-Loison commented 1 year ago

Could also make web server logs search for YouTube Data API v3 keys be executed on private instances, as all its users don't seem be comfortable with this subject.

Benjamin-Loison commented 8 months ago

Should clean inter-instance key and other instances synchronization otherwise disabling the ability for anyone to provide a key seems to make sense.

Benjamin-Loison commented 2 months ago

Projects that enable the YouTube Data API have a default quota allocation of 1 million units per day

Note that projects that had enabled the YouTube Data API before April 20, 2016, have a different default quota for that API.

https://web.archive.org/web/20160828004328/https://developers.google.com/youtube/v3/getting-started

https://web.archive.org/web/20160404033352/https://developers.google.com/youtube/v3/getting-started is the most recent to snapshot to April 20, 2016 but does not mention how many quota is provided by default.

Benjamin-Loison commented 4 days ago

Does API explorer provides unlimited quota?

curl -s "https://content-youtube.googleapis.com/youtube/v3/search?part=snippet&q=test&key=AIzaSyBXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
Output: ```json { "error": { "code": 403, "message": "Requests from referer \u003cempty\u003e are blocked.", "errors": [ { "message": "Requests from referer \u003cempty\u003e are blocked.", "domain": "global", "reason": "forbidden" } ], "status": "PERMISSION_DENIED", "details": [ { "@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "API_KEY_HTTP_REFERRER_BLOCKED", "domain": "googleapis.com", "metadata": { "consumer": "projects/292824132082", "service": "youtube.googleapis.com" } } ] } } ```
minimizeCURL curl.sh 'youtube#searchResult'
Output: ``` Initial command length: 1,158. Removing headers Command with length 1,069 is still fine. Command with length 1,052 is still fine. Command with length 1,015 is still fine. Command with length 969 is still fine. Command with length 795 is still fine. Command with length 757 is still fine. Command with length 681 is still fine. Command with length 632 is still fine. Command with length 582 is still fine. Command with length 570 is still fine. Command with length 554 is still fine. Command with length 526 is still fine. Command with length 292 is still fine. Command with length 265 is still fine. Command with length 239 is still fine. Command with length 206 is still fine. Command with length 188 is still fine. Removing URL parameters Command with length 175 is still fine. Command with length 168 is still fine. Removing cookies Removing raw data curl 'https://content-youtube.googleapis.com/youtube/v3/search?key=AIzaSyBXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' -H 'X-Origin: https://explorer.apis.google.com' ```

https://console.cloud.google.com/apis/api/youtube.googleapis.com/quotas?project=my-project-XXXXXXXXXXXXX is not up-to-date in realtime, so let us make as many requests as possible and count them.

Benjamin-Loison commented 4 days ago

Maybe it expires quickly but thanks to web-scraping can easily recreate one.

Benjamin-Loison commented 4 days ago
 counter=0
while [ 1 ]
do
    echo "counter: $counter"
    curl -s "https://content-youtube.googleapis.com/youtube/v3/search?part=snippet&q=$counter&key=AIzaSyBXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H 'X-Origin: https://explorer.apis.google.com' | jq '.items | length'
    ((counter++))
    #break
done

leads to counter more than hundreds while having returned length still being default 5.

Same with https://www.googleapis.com/youtube/v3/search.

If necessary could also investigate OAuth and maybe use an account for each of these 4 cases (OAuth/key and URL) because of quota display delay.