I am no python expert so I don't know if this is the best fix, but changing the following line in parsing.py, in edx_json2srt from:
if t == '':
to
if t == '' or t is None:
seems to fix this for this problematic json file at least.
Observation
I notice that the edx.org server itself can generate these .srt files on the fly, as they are downloadable by a user of the website. Their parser copes with nulls in their json and presumably any other quirks there might be. Maybe you could GET or POST to the correct URLs to let their server generate the .srt file rather than you interpreting the json yourself?
🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.
Subject of the issue
Downloading certain subtitles fails with
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Your environment
Steps to reproduce
edx-dl -s -u <censored> https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/course/ --filter-section 8
Expected behaviour
All videos and subtitles download OK.
Actual behaviour
English subtitles for video 22 do not download and the code crashes. Output:
More info
Downloading the problematic json file https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/xblock/block-v1:MITx+6.002.1x+2T2019+type@video+block@S8V6_Another_Dependent_Source_Example/handler/transcript/translation/en I notice that there are entries in the "text" section that use "null" (without quotes), where other json files, such as https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/xblock/block-v1:MITx+6.002.1x+2T2019+type@video+block@S8V5_Another_Dependent_Source_Example/handler/transcript/translation/en use the literal text "None". I expect your json parser is not expecting the nulls.
I am no python expert so I don't know if this is the best fix, but changing the following line in parsing.py, in edx_json2srt from:
if t == '':
to
if t == '' or t is None:
seems to fix this for this problematic json file at least.
Observation
I notice that the edx.org server itself can generate these .srt files on the fly, as they are downloadable by a user of the website. Their parser copes with nulls in their json and presumably any other quirks there might be. Maybe you could GET or POST to the correct URLs to let their server generate the .srt file rather than you interpreting the json yourself?