Puyodead1 / udemy-downloader

A Udemy downloader that can download courses, with DRM support.
MIT License
1.33k stars 311 forks source link

[Bug]: Shaka-packager don't love spaces and special characters. (Decryption returned a non-zero exit code.) #137

Closed rickeymandraque closed 1 year ago

rickeymandraque commented 2 years ago

What happened?

Decryption error with path and filename containing spaces.

Exception: Decryption returned a non-zero exit code

I checked the version of shaka-packager, it's latest. (see https://github.com/Puyodead1/udemy-downloader/issues/116)

I tried with mp4decrypt:

sudo cp ./mp4decrypt /usr/bin
sudo chmod 755 /usr/bin/mp4decrypt
mp4decrypt --show-progress --key bdd****************************4c3:758f************************cdd'/home/rickey/Github/udemy-downloader/out_dir/powershell-core-les-fondamentaux/01 - Introduction to PowerShell/005 Discovering the PowerShell console .encrypted.mp4' '/home/rickey/Github/udemy-downloader/out_dir/powershell-core-les-fondamentaux/01 - Introduction to PowerShell/005 Discovering the PowerShell console.mp4'

It works without audio.

So I thought shaka was buggy, I tried manually:

sudo cp ./shaka-packager /usr/bin
sudo chmod 755 /usr/bin/shaka-packager
shaka-packager --enable_raw_key_decryption --keys key_id=bdd*************************4c3:key=758**********************cdd in="/home/rickey/Github/udemy-downloader-feat-selenium/out_dir/powershell-core-les-fondamentaux/01 - Introduction to PowerShell/005 Discovering the powershell console .encrypted.mp4",stream=video,output=test.mp4

[0816/070349:ERROR:packager_main.cc(551)] Packager Error: 5 (FILE_FAILURE): Could not open file for reading /home/rickey/Github/udemy-downloader-feat-selenium/out_dir/powershell - core-the-fundamentals/01 - Introduction to PowerShell/005 Getting to Know PowerShell console.encrypted.mp4

I downloaded selenium branch, same bug.

After doing some research, I noticed that users weren't using spaces in their paths or filenames. See https://github.com/shaka-project/shaka-packager/issues/334 and https://github.com/shaka-project/shaka-packager/issues/962

cd "/home/rickey/Github/udemy-downloader-feat-selenium/out_dir/powershell-core-les-fondamentaux/01 - Introduction to PowerShell/"
cp "01 - Introduction to PowerShell/005 Discovering PowerShell console.encrypted.mp4" test.encrypted.mp4
shaka-packager --enable_raw_key_decryption --keys key_id=bdd*************************4c3:key=758**********************dd in=test.encrypted.mp4,stream=0,output=test.mp4

It worked.

Of course, I tested many syntaxes like these:

shaka-packager input="blablablabla", stream=0
shaka-packager input="blabla blabla",stream=0
shaka-packager=blabla blabla, stream=video
shaka-packager 'input=blabla blabla,stream=video'
shaka-packager 'input=blabla blabla, stream=video'
shaka-packager 'input="blabla blabla",stream=0'
shaka-packager 'in="blabla blabla",stream=0'

To join audios and videos :

shaka-packager --enable_raw_key_decryption --keys key_id=bdd*************************4c3:key=758**********************dd in=test.encrypted.mp4,stream=video,output=test.mp4
shaka-packager --enable_raw_key_decryption --keys key_id=bdd*************************4c3:key=758**********************dd in=test.encrypted.m4a,stream=audio,output=test.m4a
ffmpeg -i test.mp4 -i test.m4a -acodec copy -vcodec copy merged.mp4

Same error on Windows 10 (in VM), but i don't know why, shaka work correctly with or without spaces.

Expected Result

a new script to decrypt and merge (manually ?) the "*.encryted.mp4" and "*.encryted.m4a" ?

Branch

master/main

What operating systems are you seeing the problem on?

Linux/Unix

Relevant log output

[10:26:47] [udemy-downloader] [handle_segments:1344] INFO: > Lecture Tracks Downloaded
[10:26:47] [udemy-downloader] [handle_segments:1353] INFO: KID for video file is: BDD***************************************4C3
[10:26:47] [udemy-downloader] [handle_segments:1360] INFO: KID for audio file is: BDD***************************************4C3
[10:26:47] [udemy-downloader] [handle_segments:1366] INFO: > Decrypting video, this might take a minute...
[0815/222647:ERROR:packager_main.cc(551)] Packaging Error: 5 (FILE_FAILURE): Cannot open file for reading 005 Découverte de la console PowerShell.encrypted.mp4
[10:26:47] [udemy-downloader] [handle_segments:1389] ERROR: [-] Error: 
Traceback (most recent call last):
  File "/home/rickey/Github/udemy-downloader-feat-selenium/main.py", line 1367, in handle_segments
    ret_code = decrypt(video_kid, video_filepath_enc, video_filepath_dec)
  File "/home/rickey/Github/udemy-downloader-feat-selenium/main.py", line 1304, in decrypt
    raise Exception("Decryption returned a non-zero exit code")
Exception: Decryption returned a non-zero exit code

Other information

No response

rickeymandraque commented 2 years ago

Another tests, another results !

shaka-packager --enable_raw_key_decryption --keys key_id=bdd************************c3:key=758************************cdd 'in=./test encrypted.mp4,stream=video,output=test decrypted.mp4'
# WHAT THE... ?!?
2162688/150692350
111/1Packaging completed successfully.

shaka-packager --enable_raw_key_decryption --keys key_id=bdd************************c3:key=758************************cdd 'in=./005 test encrypté là 1.mp4,stream=video,output=test decrypted.mp4'
# BAZINGAAA !
[0816/140446:ERROR:packager_main.cc(551)] Packaging Error: 5 (FILE_FAILURE): Cannot open file for reading ./005 test encrypté là 1.mp4

It don't like ansi charset ! Same issue with official build (2.6.1)

rickeymandraque commented 2 years ago

I've found a solution ! On the line 1026 I read this :

 # commas cause problems with shaka-packager resulting in decryption failure

So I added this (still working on it) :

    file_name = file_name.replace("é", "e")
    file_name = file_name.replace("è", "e")
    file_name = file_name.replace("à", "a")

I have no knowledge of the python language so I'm going slowly. I thought of writing a piece of code like this:

if course.lang == "french"
    file_name = file_name.replace("é", "e")
    file_name = file_name.replace("è", "e")
    file_name = file_name.replace("à", "a")
    file_name = file_name.replace("ç", "c")
    file_name = file_name.replace("ù", "u")
    # etc...
    file_name = file_name.replace(",", "")
    file_name = file_name.replace(".mp4", "")
else
    file_name = file_name.replace(",", "")
    file_name = file_name.replace(".mp4", "")

Results :

 ls ./out_dir/powershell-core-les-fondamentaux/01\ -\ Introduction\ à\ PowerShell/
'001 Présentation de PowerShell_fr.srt'  '002 Histoire de PowerShell.mp4'  "004 Préparation de l'environnement_fr.srt"       '005 Découverte de la console PowerShell.mp4'
'001 Présentation de PowerShell.mp4'     "003 Notions d'objet_fr.srt"      "004 Préparation de l'environnement.mp4"
'002 Histoire de PowerShell_fr.srt'      "003 Notions d'objet.mp4"         '005 Découverte de la console PowerShell_fr.srt'

No error, good titles !

Edit :

first "patch' :

def handle_segments(url, format_id, video_title, output_path, lecture_file_name, chapter_dir):
    os.chdir(os.path.join(chapter_dir))

    file_name = lecture_file_name.replace("%", "")
    # for french language, this characters cause problems with shaka-packager resulting in decryption failure
    # https://github.com/Puyodead1/udemy-downloader/issues/137
    file_name = file_name.replace("À", "A")
    file_name = file_name.replace("à", "a")
    file_name = file_name.replace("Á", "A")
    file_name = file_name.replace("á", "a")
    file_name = file_name.replace("Â", "a")
    file_name = file_name.replace("â", "a")
    file_name = file_name.replace("Ã", "A")
    file_name = file_name.replace("ã", "a")
    file_name = file_name.replace("Ä", "A")
    file_name = file_name.replace("ä", "a")
    file_name = file_name.replace("Å", "A")
    file_name = file_name.replace("å", "a")
    file_name = file_name.replace("Æ", "AE")
    file_name = file_name.replace("æ", "ae")
    file_name = file_name.replace("Ç", "C")
    file_name = file_name.replace("ç", "c")
    file_name = file_name.replace("Ð", "D")
    file_name = file_name.replace("ð", "o")
    file_name = file_name.replace("È", "E")
    file_name = file_name.replace("è", "e")
    file_name = file_name.replace("É", "e")
    file_name = file_name.replace("Ê", "e")
    file_name = file_name.replace("ê", "e")
    file_name = file_name.replace("Ë", "E")
    file_name = file_name.replace("ë", "e")
    file_name = file_name.replace("Ì", "I")
    file_name = file_name.replace("ì", "i")
    file_name = file_name.replace("Í", "I")
    file_name = file_name.replace("í", "I")
    file_name = file_name.replace("Î", "I")
    file_name = file_name.replace("î", "i")
    file_name = file_name.replace("Ï", "I")
    file_name = file_name.replace("ï", "i")
    file_name = file_name.replace("Ñ", "N")
    file_name = file_name.replace("ñ", "n")
    file_name = file_name.replace("Ò", "O")
    file_name = file_name.replace("ò", "o")
    file_name = file_name.replace("Ó", "O")
    file_name = file_name.replace("ó", "o")
    file_name = file_name.replace("Ô", "O")
    file_name = file_name.replace("ô", "o")
    file_name = file_name.replace("Õ", "O")
    file_name = file_name.replace("õ", "o")
    file_name = file_name.replace("Ö", "o")
    file_name = file_name.replace("ö", "o")
    file_name = file_name.replace("œ", "oe")
    file_name = file_name.replace("Œ", "OE")
    file_name = file_name.replace("Ø", "O")
    file_name = file_name.replace("ø", "o")
    file_name = file_name.replace("ß", "B")
    file_name = file_name.replace("Ù", "U")
    file_name = file_name.replace("ù", "u")
    file_name = file_name.replace("Ú", "U")
    file_name = file_name.replace("ú", "u")
    file_name = file_name.replace("Û", "U")
    file_name = file_name.replace("û", "u")
    file_name = file_name.replace("Ü", "U")
    file_name = file_name.replace("ü", "u")
    file_name = file_name.replace("Ý", "Y")
    file_name = file_name.replace("ý", "y")
    file_name = file_name.replace("Þ", "P")
    file_name = file_name.replace("þ", "P")
    file_name = file_name.replace("Ÿ", "Y")
    file_name = file_name.replace("ÿ", "y")
    # commas cause problems with shaka-packager resulting in decryption failure
    file_name = file_name.replace(",", "")
    file_name = file_name.replace(".mp4", "")

voila !

rickeymandraque commented 2 years ago

This is the end !

On the advice and help of a friend, he optimized the script code. Line 1 :

# -*- coding: utf-8 -*-
import argparse

Line 1025 and after :

    file_name = lecture_file_name.replace("%", "")
    # for french language among others, this characters cause problems with shaka-packager resulting in decryption failure
    # https://github.com/Puyodead1/udemy-downloader/issues/137
    # Thank to cutecat !
    file_name = file_name.replace("é", "e").replace("è", "e").replace("à", "a").replace("À", "A").replace("à", "a").replace("Á", "A").replace("á", "a").replace("Â", "a").replace("â", "a").replace("Ã", "A").replace("ã", "a").replace("Ä", "A").replace("ä", "a").replace("Å", "A").replace("å", "a").replace("Æ", "AE").replace("æ", "ae").replace("Ç", "C").replace("ç", "c").replace("Ð", "D").replace("ð", "o").replace("È", "E").replace("è", "e").replace("É", "e").replace("Ê", "e").replace("ê", "e").replace("Ë", "E").replace("ë", "e").replace("Ì", "I").replace("ì", "i").replace("Í", "I").replace("í", "I").replace("Î", "I").replace("î", "i").replace("Ï", "I").replace("ï", "i").replace("Ñ", "N").replace("ñ", "n").replace("Ò", "O").replace("ò", "o").replace("Ó", "O").replace("ó", "o").replace("Ô", "O").replace("ô", "o").replace("Õ", "O").replace("õ", "o").replace("Ö", "o").replace("ö", "o").replace("œ", "oe").replace("Œ", "OE").replace("Ø", "O").replace("ø", "o").replace("ß", "B").replace("Ù", "U").replace("ù", "u").replace("Ú", "U").replace("ú", "u").replace("Û", "U").replace("û", "u").replace("Ü", "U").replace("ü", "u").replace("Ý", "Y").replace("ý", "y").replace("Þ", "P").replace("þ", "P").replace("Ÿ", "Y").replace("ÿ", "y").replace("%", "")
    # commas cause problems with shaka-packager resulting in decryption failure
    file_name = file_name.replace(",", "")
    file_name = file_name.replace(".mp4", "")

Et Voilà !

Edit : I will look for problematic characters in other languages.

@Puyodead1 I voluntarily leave this issue open if anyone wants to add anything. You can close it if you want. Issue resolved. main.py.zip

rickeymandraque commented 2 years ago

It's never over! I come to the conclusion that Shaka-packager (2.5.3 Truedread) is buggy for some functionality.

I modified constants.py to download the videos directly to my external hard drive.

HOME_DIR = os.getcwd()
DOWNLOAD_DIR = os.path.join(os.getcwd(), "/media/rickey/Anakin/Public/UdemyDL")
SAVED_DIR = os.path.join(os.getcwd(), "saved")
KEY_FILE_PATH = os.path.join(os.getcwd(), "/home/rickey/Github/udemy-downloader/keyfile.json")

The Videos download, the script works correctly but when shaka has to decrypt, I always get the same error.

42/1[0827/074818:ERROR:packager_main.cc(551)] Packaging Error: 5 (FILE_FAILURE): Cannot rename temp file to /media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4
[07:48:18] [udemy-downloader] [handle_segments:1123] ERROR: Error: 
Traceback (most recent call last):
  File "/home/rickey/Github/udemy-downloader/main.py", line 1101, in handle_segments
    ret_code = decrypt(video_kid, video_filepath_enc, video_filepath_dec)
  File "/home/rickey/Github/udemy-downloader/main.py", line 1034, in decrypt
    raise Exception("Decryption returned a non-zero exit code")
Exception: Decryption returned a non-zero exit code

I thought the script wasn't designed to work that way, so it was normal that it wouldn't work even if it's illogical (since all the other modules work).

So I added this to understand how this part of the script worked:

    else:
        command = f'nice -n 7 shaka-packager --enable_raw_key_decryption --keys key_id={kid}:key={key} input="{in_filepath}",stream_selector="0",output="{out_filepath}"'
        print(command)

this returns only the file name in the command parameters, so I simply "tinkered" with this to observe the result, thinking that shaka-packager might need the absolute path:

    else:
        command = f'nice -n 7 shaka-packager --enable_raw_key_decryption --keys key_id={kid}:key={key} input="{os.path.realpath(in_filepath)}",stream_selector="0",output="{os.path.realpath(out_filepath)}"'
        print(command)

I still get the same error.

So I ran the decryption command alone and again and again the same error.

shaka-packager --enable_raw_key_decryption --keys key_id=FXXXXXXXXXXXXXXXXXXE:key=6XXXXXXXXXXXXXXXXXXXXXXXX9 input="/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.encrypted.mp4",stream_selector="0",output="/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4"
2162688/38461395
42/1[0827/075349:ERROR:packager_main.cc(551)] Packaging Error: 5 (FILE_FAILURE): Cannot rename temp file to /media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4

I downloaded the official version of shaka-pakager (2.6.1) and no error!

./shaka-packager --enable_raw_key_decryption --keys key_id=FXXXXXXXXXXXXXXXE:key=6XXXXXXXXXXXXXXXXXXXXXXX9 input="/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.encrypted.mp4",stream_selector="0",output="/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4"
[0827/075510:INFO:demuxer.cc(89)] Demuxer::Run() on file '/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.encrypted.mp4'.
[0827/075510:INFO:demuxer.cc(155)] Initialize Demuxer for file '/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.encrypted.mp4'.
[0827/075510:INFO:single_segment_segmenter.cc(111)] Update media header (moov) and rewrite the file to '/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4'.
[0827/075511:INFO:mp4_muxer.cc(186)] MP4 file '/media/rickey/Anakin/Public/UdemyDL/gnulinux-de-debutant-a-confirme-en-quelques-heures/05 - Le scripting Bash/001 Afficher du texte a l'ecran grace a un Script et au Shebang.decrypted.mp4' finalized.
Packaging completed successfully.

Maybe need a new release...

rickeymandraque commented 2 years ago

I think I found the cause of the problem, Truedread has deleted the temporary copy. Honestly, I don't know how it works, but it still causes bugs. So I took the official sources of Shaka-Packager and I just commented out the line that was used to display the log-infos to have a behavior similar to the fork of Truedread.

the release for linux is here

I haven't tested all cases yet, but already, I have no more bugs, or failures, on external devices. It remains to be seen if this had anything to do with special characters and spaces (I doubt it).

I can't build release for Windows now...

My release is based on 2.6.1+++

eliottp1089 commented 2 years ago

Change decrypt function for this and bye special characters issue

  try:
        key = keys[kid.lower()]
    except KeyError:
        raise KeyError("Key not found")

    if os.name == "nt":
        command = f'ffmpeg -decryption_key {key} -i "{in_filepath}" -c copy "{out_filepath}"'
    else:
        command = f'nice -n 7 ffmpeg -decryption_key {key} -i "{in_filepath}" -c copy "{out_filepath}"'

    process = subprocess.Popen(command, shell=True)
    log_subprocess_output("FFMPEG-STDOUT", process.stdout)
    log_subprocess_output("FFMPEG-STDERR", process.stderr)
    ret_code = process.wait()
    if ret_code != 0:
        raise Exception("Decryption returned a non-zero exit code")

    return ret_code
rexfordnyrk commented 1 year ago

This is the end !

On the advice and help of a friend, he optimized the script code. Line 1 :

# -*- coding: utf-8 -*-
import argparse

Line 1025 and after :

    file_name = lecture_file_name.replace("%", "")
    # for french language among others, this characters cause problems with shaka-packager resulting in decryption failure
    # https://github.com/Puyodead1/udemy-downloader/issues/137
    # Thank to cutecat !
    file_name = file_name.replace("é", "e").replace("è", "e").replace("à", "a").replace("À", "A").replace("à", "a").replace("Á", "A").replace("á", "a").replace("Â", "a").replace("â", "a").replace("Ã", "A").replace("ã", "a").replace("Ä", "A").replace("ä", "a").replace("Å", "A").replace("å", "a").replace("Æ", "AE").replace("æ", "ae").replace("Ç", "C").replace("ç", "c").replace("Ð", "D").replace("ð", "o").replace("È", "E").replace("è", "e").replace("É", "e").replace("Ê", "e").replace("ê", "e").replace("Ë", "E").replace("ë", "e").replace("Ì", "I").replace("ì", "i").replace("Í", "I").replace("í", "I").replace("Î", "I").replace("î", "i").replace("Ï", "I").replace("ï", "i").replace("Ñ", "N").replace("ñ", "n").replace("Ò", "O").replace("ò", "o").replace("Ó", "O").replace("ó", "o").replace("Ô", "O").replace("ô", "o").replace("Õ", "O").replace("õ", "o").replace("Ö", "o").replace("ö", "o").replace("œ", "oe").replace("Œ", "OE").replace("Ø", "O").replace("ø", "o").replace("ß", "B").replace("Ù", "U").replace("ù", "u").replace("Ú", "U").replace("ú", "u").replace("Û", "U").replace("û", "u").replace("Ü", "U").replace("ü", "u").replace("Ý", "Y").replace("ý", "y").replace("Þ", "P").replace("þ", "P").replace("Ÿ", "Y").replace("ÿ", "y").replace("%", "")
    # commas cause problems with shaka-packager resulting in decryption failure
    file_name = file_name.replace(",", "")
    file_name = file_name.replace(".mp4", "")

Et Voilà !

Edit : I will look for problematic characters in other languages.

@Puyodead1 I voluntarily leave this issue open if anyone wants to add anything. You can close it if you want. Issue resolved. main.py.zip

Thanks for this pointer. I faced the same issue and printed the command string. after manual testing, I figured later that shaka also doesn't like the "—" character. I, therefore, added .replace("—","-") to the main.py file at the end of your patch and that fixed it.

The only thing was it had to redownload the files instead of using the existing ones but that's fine. I was feeling lazy to write a script to rename all the files using "-". Plus in my case redownloading 9GB of data was faster than having to write a script for that. lol

anyway, do I do a pull request for the patch?

bydioeds commented 1 year ago

many thanks, i was having trouble, i even tough my way of getting the keys was wrong.

here's an easier way of achieving it

from unidecode import unidecode
file_name = unidecode(file_name)

Edit: seems like unidecode replaces this character "to a double quote causing issues with the decryption again, but is easily fixed by replacing it after doing unidecode()

Puyodead1 commented 1 year ago

Closing this as these issues should be resolved.

gersooonn commented 1 year ago

image

How did you decide? For me it keeps showing Key not found. My keys are correct, I can only decrypt file by file, it gets very tiring

Puyodead1 commented 1 year ago

image

How did you decide? For me it keeps showing Key not found.

My keys are correct, I can only decrypt file by file, it gets very tiring

Unrelated to this. And if it says keys not found, it means your keys are not correct. You most likely didn't fill out the key file correctly