JuanBindez / pytubefix

Python3 library for downloading YouTube Videos.
https://pytubefix.readthedocs.io
MIT License
728 stars 101 forks source link

FFMPEG sometime fails due to filename ambiguity #311

Closed warmonkey closed 3 weeks ago

warmonkey commented 3 weeks ago

I removed file_system.py, file_system_verify(), translation_table. use safe_filename() on Stream.download() and ffmpeg_process()

JuanBindez commented 3 weeks ago

will not be accepted, the file system prevents errors for each environment, probably the file name you wanted to download was prohibited for your system and this would cause an error, therefore it will not be accepted

warmonkey commented 3 weeks ago

It's not like that.

The original ffmpeg_process takes youtube.title as filename, it fails on title with colon. the ffmpeg output file name will be truncated and ffmpeg fails. The original filename policy is in file_system.file_system_verify(). On windows it remove these chars str.maketrans({'\': '', '/': '', '?': '', ':': '', '*': '', '"': '', '<': '', '>': '', '|': '',}) on MacOS it only remove ':', on Linux/BSD it only remove '/'

But it actually doesn't work on NTFS mounted on ubuntu.So i choose safe_filename() which remove this chars [ '"', '#', '$', '%', "'", '*', ',', '.', '/', ':', '"', ';', '<', '>', '?', '\', '^', '|', '~', '\'] This is the safest option.

JuanBindez commented 3 weeks ago

file system removes it correctly, and will be kept that way.

warmonkey commented 3 weeks ago

I installed and tested already, will show u later. I just choose the most conservative naming policy, it should be working on ntfs mounting on linux host. On linux platform theres no way to tell the path is windows fs (samba or ntfs) or linux fs (ext, xfs or zfs...)

JuanBindez commented 3 weeks ago

You don't understand, even if you mount an NTFS it will necessarily go through the operating system's file system, that is, it will be the same thing to use the file system and the safe filename, and if you don't use it, it will give an error in the file system, there is no how do you get around this, and another thing safe filename removes characters unnecessarily for no reason, it's not because the name is "safe filename" that this is safer, in fact it's not even a question of security, but of character incompatibility reserved from the file system, the old way doesn't make any sense removing characters that should be allowed in Linux environments.

JuanBindez commented 3 weeks ago

If you are mounting NTFS on your Linux, just use safe_filename in your scripts, and don't simply remove file_system from pytubefix.

warmonkey commented 3 weeks ago

If you are mounting NTFS on your Linux, just use safe_filename in your scripts, and don't simply remove file_system from pytubefix.

I know what is the incompatible character problem. on Linux only null and / are not allowed. on NTFS more characters are forbidden in additional to /

The forbidden printable ASCII characters are:
    Linux/Unix:
    / (forward slash)
    Windows:
    < (less than)
    > (greater than)
    : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
    " (double quote)
    / (forward slash)
    \ (backslash)
    | (vertical bar or pipe)
    ? (question mark)
    * (asterisk)

Non-printable characters
    Linux/Unix: 0 (NULL byte) 
    Windows:  0-31 (ASCII control characters)

I already replaced all calls to file_systems.file_system_verify() using safe_filename() instead which makes sure that it works on all possible scenarios.

Test on windows platform

PS E:\> pip install pytubefix
Collecting pytubefix
  Downloading pytubefix-8.2.0-py3-none-any.whl.metadata (6.8 kB)
Downloading pytubefix-8.2.0-py3-none-any.whl (84 kB)
Installing collected packages: pytubefix
Successfully installed pytubefix-8.2.0
PS E:\> 'PATCH cli.py fixing resolution=args.resolution
PS E:\> pytubefix -f ffmpeg.exe https://www.youtube.com/watch?v=rSY1pVGdZ4I
Loading video...
CS 182: Lecture 1, Part 1: Introduction_video_0 | 22 MB
 ↳ |██████████████████████████████████████████████████████████████████| 100.0%
CS 182: Lecture 1, Part 1: Introduction_audio_0 | 11 MB
 ↳ |██████████████████████████████████████████████████████████████████| 100.0%
ffmpeg version 2023-04-12-git-1179bb703e-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: ......
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'E:\CS 182 Lecture 1, Part 1 Introduction_video_0':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2021-03-14T19:58:29.000000Z
  Duration: 00:15:54.85, start: 0.000000, bitrate: 193 kb/s
  Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 640x360 [SAR 1:1 DAR 16:9], 61 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      creation_time   : 2021-03-14T19:58:29.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/14/2021.
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2021-03-14T19:58:29.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 03/14/2021.
      vendor_id       : [0][0][0][0]
Input #1, matroska,webm, from 'E:\CS 182 Lecture 1, Part 1 Introduction_audio_0':
  Metadata:
    encoder         : google/video-file
  Duration: 00:15:54.86, start: -0.007000, bitrate: 99 kb/s
  Stream #1:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
E:\/CS 182: Lecture 1, Part 1: Introduction.mp4: Invalid argument

On kubuntu 22.04

sudo ntfsfix /dev/sdb1
Mounting volume... OK
Processing of $MFT and $MFTMirr completed successfully.
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/sdb1 was processed successfully.
sa@sa-kde:~$ sudo mount /dev/sdb1 /media/sa/73F204F4039C05FC
sa@sa-kde:~$ cd /media/sa/73F204F4039C05FC/
sa@sa-kde:/media/sa/73F204F4039C05FC$ pytubefix https://www.youtube.com/watch?v=rSY1pVGdZ4I
Loading video...
Downloading highest resolution progressive stream...
CS 182: Lecture 1, Part 1: Introduction.mp4 | 22 MB
Traceback (most recent call last):
  File "/home/sa/.local/bin/pytubefix", line 8, in <module>
    sys.exit(main())
  File "/home/sa/.local/lib/python3.10/site-packages/pytubefix/cli.py", line 49, in main
    _perform_args_on_youtube(youtube, args)
  File "/home/sa/.local/lib/python3.10/site-packages/pytubefix/cli.py", line 53, in _perform_args_on_youtube
    download_highest_resolution_progressive(youtube=youtube, resolution="highest", target=args.target)
  File "/home/sa/.local/lib/python3.10/site-packages/pytubefix/cli.py", line 279, in download_highest_resolution_progressive
    _download(stream, target)
  File "/home/sa/.local/lib/python3.10/site-packages/pytubefix/cli.py", line 156, in _download
    stream.download(output_path=target, filename=filename)
  File "/home/sa/.local/lib/python3.10/site-packages/pytubefix/streams.py", line 369, in download
    with open(file_path, "wb") as fh:
OSError: [Errno 22] Invalid argument: '/media/sa/73F204F4039C05FC/CS 182: Lecture 1, Part 1: Introduction.mp4'

after commit 06a35e7 applied:

on windows

PS E:\> pytubefix -f ffmpeg.exe https://www.youtube.com/watch?v=rSY1pVGdZ4I
Loading video...
CS 182 Lecture 1 Part 1 Introduction_video_0 | 20 MB
 ↳ |██████████████████████████████████████████████████████████████████| 100.0%
CS 182 Lecture 1 Part 1 Introduction_audio_0 | 11 MB
 ↳ |██████████████████████████████████████████████████████████████████| 100.0%
ffmpeg version 2023-04-12-git-1179bb703e-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: ......
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60.  9.100 / 60.  9.100
  libavformat    60.  4.101 / 60.  4.101
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  5.100 /  9.  5.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'E:\CS 182 Lecture 1 Part 1 Introduction_video_0':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    creation_time   : 2021-03-14T19:58:30.000000Z
  Duration: 00:15:54.83, start: 0.000000, bitrate: 175 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 1 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      creation_time   : 2021-03-14T19:58:30.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, matroska,webm, from 'E:\CS 182 Lecture 1 Part 1 Introduction_audio_0':
  Metadata:
    encoder         : google/video-file
  Duration: 00:15:54.86, start: -0.007000, bitrate: 99 kb/s
  Stream #1:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (opus (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mp4, to 'E:\/CS 182 Lecture 1 Part 1 Introduction.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    encoder         : Lavf60.4.101
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 1 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      creation_time   : 2021-03-14T19:58:30.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
  Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      encoder         : Lavc60.9.100 aac
frame=28645 fps=1440 q=-1.0 Lsize=   35898kB time=00:15:54.84 bitrate= 308.0kbits/s speed=  48x
video:20140kB audio:14841kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.620296%
[aac @ 000001a359242e80] Qavg: 634.231

on kubuntu 22.04

sa@sa-kde:/media/sa/73F204F4039C05FC$ pytubefix https://www.youtube.com/watch?v=rSY1pVGdZ4I
Loading video...
Downloading highest resolution progressive stream...
CS 182: Lecture 1, Part 1: Introduction.mp4 | 22 MB
warmonkey commented 2 weeks ago

Hi any updates?