Closed flomet closed 4 years ago
Just tried it again, and it's not working anymore. It seems Soundcloud puts the client_id in a .js file which doesn't have a consistent filenaming that could be searched and singled out. I changed the code to collect all the .js-files in the find_script_url-function and return it as a list. In the get_credentials functions it iterates through this list and stores the content of all the js-files in a variable which gets passed to util.find_client_id(). It's working like this - at least for the moment.
def get_credentials(self):
url = random.choice(util.SCRAPE_URLS)
page_text = get_page(url)
script_urls = util.find_script_url(page_text)
script_text_all = ""
for script in script_urls:
if type(script) is str and not "":
script_text_all += f'{get_page(script)}'
self.client_id = util.find_client_id(script_text_all)
def find_script_url(html_text):
dom = BeautifulSoup(html_text, 'html.parser')
scripts = dom.findAll('script', attrs={'src': True})
script_list=[]
for script in scripts:
src = script['src']
script_list.append(src)
return script_list
def find_client_id(script_text):
return re.findall(r'client_id=([a-zA-Z0-9]+)', script_text)[0]
Thanks for looking in to this. Can you submit a pull request with your changes?
I gave it a shot. Hope it works. It's my first pull request ;)
Just tried your script and it wasn't working. I made two small changes to get it running.
I changed util.py line 18: if '48-' in src.split('/')[-1]: to if '2-5' in src.split('/')[-1]:
and line 23 return re.findall(r'client_id:"([a-zA-Z0-9]+)"', script_text)[0] to return re.findall(r'client_id=([a-zA-Z0-9]+)', script_text)[0]
Its working for me, now. It seems they changed the js-filename which has the clientid and changed the ":" to "=".