Closed jongkok closed 5 years ago
This one is a bit more tricky... There's an API for JW broadcasting, and there's an API for downloading publications. But I haven't seen any API for articles and pages on the website, and I wouldn't think there is any either, because that would be overkill.
That would mean we need a web page scrapper. And that would mean it could break whenever there's an update to the layout etc of the webpage.
I know there's interest in scrapping jw.org, not only for downloading a bunch of audio, but also for things like a jw.org news client for Kodi etc... It would be nice, but it's a bit of a project on its own.
I'll take a look at how the audio recordings are handled, but chances are all solutions are too fragile.
May I ask why you need this, and how Python-savvy you are?
Yeah if you can get hold of the document ID there is an API to download the MP3s... But the kink is to get the ID... I'm giving you an unorthodox quick fix here and it only works for web articles. Tweak it to suit your needs.
#!/usr/bin/env python3
# Run the program with an jw.org URL as an argument to
# download all recordings that are referenced to in that page
import sys, re, urllib.request, json
lang = 'E'
api_url = 'https://apps.jw.org/GETPUBMEDIALINKS?output=json&alllangs=0&fileformat=MP3&langwritten=' + lang + '&txtCMSLang=' + lang + '&docid='
data = urllib.request.urlopen(sys.argv[1]).read().decode('utf-8')
matches = re.finditer('data-page-id="mid([^"]*)"', data)
ids = set(x.group(1) for x in matches) # set() removes all doubles
for i in ids:
try:
print('requesting data about', i)
response = urllib.request.urlopen(api_url + i)
except:
continue
tree = json.loads(response.read().decode('utf-8'))
file_url = tree['files'][lang]['MP3'][0]['file']['url'] # Assuming there's only one MP3
file_title = tree['files'][lang]['MP3'][0]['title']
file_name = re.sub('[<>:"|?*/\0]', '', file_title) + '.mp3' # NTFS safe
print('downloading', file_title)
urllib.request.urlretrieve(file_url, filename=file_name)
Dear brother,
This is not an issue....more of a feature request ^_^ Perhaps there's a way we can download all audio recordings from experiences section ?
Thank you