Closed Guillaume-oso closed 3 years ago
I have a really disgusting fix you can try, I don't want to make a PR for it because it's just a dumb fix. I personally use the sync.py version of the tool and after debugging for like 1,5 hours I figured out what was causing the error, but not why.
It has to do with parsing the scripts, it tries to access some newly added cookielaw script on a different domain, but errors out when trying to load it (but when you manually go there, the resource is available). Since it's the first script that is found in the DOM, i simply pop it from the script_urls array.
This is what my get_credentials() in sync.py looks like:
def get_credentials(self):
url = random.choice(util.SCRAPE_URLS)
page_text = get_page(url)
script_urls = util.find_script_urls(page_text)
script_urls.pop(0) # to remove cookielaw.org .js from the list (first .js in DOM)
for script in script_urls:
if not self.client_id:
if type(script) is str and not "":
js_text = f'{get_page(script)}'
self.client_id = util.find_client_id(js_text)
I'm going to assume this also works with the async.py, but I haven't tested it.
@DJoepie Thank you so much ! It worked for me
The first example (from soundcloud-lib) fails:
from sclib import SoundcloudAPI, Track, Playlist
api = SoundcloudAPI() # never pass a Soundcloud client ID that did not come from this library
track = api.resolve('https://soundcloud.com/itsmeneedle/sunday-morning')
assert type(track) is Track
filename = f'./{track.artist} - {track.title}.mp3'
with open(filename, 'wb+') as fp:
track.write_mp3_to(fp)
Even after applying the patch suggested by @DJoepie, I get same error as OP:
File "soundcloud_dl.py", line 7, in <module>
track = api.resolve(target_url)
File "/Users/dth/.local/share/virtualenvs/python_experiments-VCQIX0Is/lib/python3.6/site-packages/sclib/sync.py", line 59, in resolve
self.get_credentials()
File "/Users/dth/.local/share/virtualenvs/python_experiments-VCQIX0Is/lib/python3.6/site-packages/sclib/sync.py", line 54, in get_credentials
js_text = f'{get_page(script)}'
File "/Users/dth/.local/share/virtualenvs/python_experiments-VCQIX0Is/lib/python3.6/site-packages/sclib/sync.py", line 14, in get_page
return get_url(url).decode('utf-8')
File "/Users/dth/.local/share/virtualenvs/python_experiments-VCQIX0Is/lib/python3.6/site-packages/sclib/sync.py", line 11, in get_url
return urlopen(url).read()
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/dth/.pyenv/versions/3.6.9/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
hey @DannyDannyDanny You're right, it's appears to be broken again. This time the 'https://cdn.cookielaw.org' script is no longer the first on the DOM, so something new to remove it from the script_urls needs to be written (rather than just popping the first from the array), I can't imagine that it is a hard function to write, I don't have the time for it until later today.
I'll keep you all posted.
Alright, Now I've got a more permanent fix for the problem. Now I filter out 'cookielaw.org' in util.py. I'll make a pull request for this small fix tomorrow or something.
Revert back def get_credentials(self):
to the original state (remove script_urls.pop(0)
)
and open the util.py file and look for the following function def find_script_urls(html_text):
All I did was looking for the cookielaw.org string in the scripurl. Make the function look like the following:
def find_script_urls(html_text):
dom = BeautifulSoup(html_text, 'html.parser')
scripts = dom.findAll('script', attrs={'src': True})
scripts_list = []
for script in scripts:
src = script['src']
if 'cookielaw.org' not in src: # filter out cookielaw.org
scripts_list.append(src)
return scripts_list
This code:
throw:
Traceback (most recent call last): File "test.py", line 5, in
playlist = api.resolve("https://soundcloud.com/demangio/sets/tekno")
File "/home/guillaume/.local/lib/python3.8/site-packages/sclib/sync.py", line 59, in resolve
self.get_credentials()
File "/home/guillaume/.local/lib/python3.8/site-packages/sclib/sync.py", line 54, in get_credentials
js_text = f'{get_page(script)}'
File "/home/guillaume/.local/lib/python3.8/site-packages/sclib/sync.py", line 14, in get_page
return get_url(url).decode('utf-8')
File "/home/guillaume/.local/lib/python3.8/site-packages/sclib/sync.py", line 11, in get_url
return urlopen(url).read()
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(args)
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(args)
File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden