globocom / m3u8

Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions
Other
2.04k stars 474 forks source link

BUG since v3.5.0: Trying to load a playlist from file using an absolute path fails on Windows because of drive letters (URL-test in m3u8.load not implemented correctly!) #387

Open e-d-n-a opened 1 week ago

e-d-n-a commented 1 week ago

I got an error while working with m3u8 v4.0.0 and trying to load a playlist from file using an absolute path on Windows.

Traceback:

Traceback (most recent call last):
  File "[myscript.py]", line 843, in <module>
    asyncio.run(main())
  File "C:\Program Files\Python39\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\Python39\lib\asyncio\base_events.py", line 647, in run_until_complete
    return future.result()
  File "[myscript.py]", line 806, in main
    await DLer.download_vod(file_pl, base_uri=url_pl_base, target=folder_target)
  File "[myscript.py]", line 478, in download_vod
    pl = m3u8.load(str(playlist), custom_tags_parser=self.__class__._parse_twitch_tags)
  File "C:\Program Files\Python39\lib\site-packages\m3u8\__init__.py", line 94, in load
    content, base_uri = http_client.download(uri, timeout, headers, verify_ssl)
  File "C:\Program Files\Python39\lib\site-packages\m3u8\httpclient.py", line 16, in download
    resource = opener.open(uri, timeout=timeout)
  File "C:\Program Files\Python39\lib\urllib\request.py", line 517, in open
    response = self._open(req, data)
  File "C:\Program Files\Python39\lib\urllib\request.py", line 539, in _open
    return self._call_chain(self.handle_open, 'unknown',
  File "C:\Program Files\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python39\lib\urllib\request.py", line 1417, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: d>

... where file_pl was an absolute path to a m3u8-playlist on drive D (see 'd' in urlopen error). (same code worked, whenfile_pl was a relative path!)

I was dumbfounded to find out, that a bug got already introduced back in v3.5.0, that makes it impossible to load playlists from files (using absolute paths) on Windows, because the new method of distinction between URLs and file paths in m3u8.load just got changed for the worse.

see blame of faulty line in m3u8.load and commit that introduced the bug with v3.5.0.

I guess, no one has loaded a playlist from a file using an absolute path since then, because it only works with relative paths on Windows now.

Distinction-method in m3u8.load of v3.5.0 [failing]: https://github.com/globocom/m3u8/blob/57d254705fd79280101c2525cf09f035e08b689c/m3u8/__init__.py#L47-L51

Distinction-method in init.py and parser.py of v3.4.0 (=previous release) [working]: https://github.com/globocom/m3u8/blob/b2a1342c6cc42e41cde25ecf291dcef9b815f3e9/m3u8/__init__.py#L45 https://github.com/globocom/m3u8/blob/b2a1342c6cc42e41cde25ecf291dcef9b815f3e9/m3u8/parser.py#L597-L598 https://github.com/globocom/m3u8/blob/b2a1342c6cc42e41cde25ecf291dcef9b815f3e9/m3u8/parser.py#L18

History of solutions:

2012-05-02: commit that introduced local file support 2012-05-18: commit that introduced is_url-function 2023-05-10: commit that introduced the bug (removing is_url-function)

urlsplit(uri).scheme == '' is a bad solution:

Test code to verify the issue (on Windows):

from urllib.parse import urlsplit
from pathlib import Path

# with current working directory being on a drive with assigned letter:
path_m3u8_rel = Path('playlist.m3u8')
# assert path_m3u8_rel.is_file()
path_m3u8_abs = path_m3u8_rel.absolute()
(urlsplit(str(path_m3u8_rel)).scheme == '' # True  - WORKS
,urlsplit(str(path_m3u8_abs)).scheme == '' # False - FAILS
,urlsplit(str(path_m3u8_abs)).scheme == path_m3u8_abs.drive[0].lower()) # True - scheme == drive letter!

Suggested solutions from Stackoverflow:

There are discussions and answers on Stackoverflow regarding this issue, with solutions you could adapt.

Answer from https://stackoverflow.com/questions/7849818/argument-is-url-or-path:

from urllib2 import urlopen

try:
    f = urlopen(sys.argv[1])
except ValueError:  # invalid URL
    f = open(sys.argv[1])

or better:

Answer from https://stackoverflow.com/questions/68626097/pythonic-way-to-identify-a-local-file-or-a-url:

from urllib.parse import urlparse
from os.path import exists

def is_local(url):
    url_parsed = urlparse(url)
    if url_parsed.scheme in ('file', ''): # Possibly a local file
        return exists(url_parsed.path)
    return False
bbayles commented 1 week ago

I'll make a PR for this!

bbayles commented 1 week ago

Here is my proposed fix.