Lovi-0 / StreamingCommunity

Script to download movies and TV series.
GNU General Public License v3.0
149 stars 17 forks source link

Corruption during download #88

Closed brazoayeye closed 7 months ago

brazoayeye commented 7 months ago

I tried three downloads: oppenheimer Italian from a xubuntu on virtualbox => download stucked many times and restarted with an error (bad message length, or something so). File was too small and audio was not synced with video.

oppenheimer Italian from a Windows => Download was OK, sync ok

Tried many times Barbie Italian. Every time the download starts but ends in few seconds. If download lasts at least 10s I have a very short video. Download starts normally but from 3% it skips to >80% then exits. Sometimes download exits in 2sec or so, then I also have an error message.

Following what I see on console

Search for any Movie or TV Series title: barbie
0 -> Barbie - movie
1 -> Barbie Dreamhouse Challenge - tv
2 -> Barbie e le 12 principesse danzanti - movie
3 -> Barbie - L'accademia per principesse - movie
4 -> Barbie - Il segreto delle fate - movie
5 -> Barbie e le tre moschettiere - movie
6 -> Barbie - La principessa e la povera - movie
7 -> Barbie e il canto di Natale - movie
8 -> Barbie nel mondo dei videogame - movie
9 -> Barbie Raperonzolo - movie
10 -> Bruno Barbieri - 4 hotel - tv
11 -> La bottega del barbiere - movie
12 -> La bottega del barbiere 2 - movie
13 -> La bottega del barbiere 3 - movie
14 -> Sweeney Todd - Il diabolico barbiere di Fleet Street - movie
15 -> Nope - movie
16 -> Halloween Ends - movie
17 -> ALL IN ONE DAY - movie
18 -> MasterChef Italia - tv
19 -> Celebrity MasterChef Italia - tv
20 -> Euphoria - tv

Total result: 21

Insert INDEX number, or [1-2] for a range of movies/tv series, or [1,3,5] to select discontinued movie/tv series

In case of a TV Series you will also choose seasons and episodes to download

Select INDEX to download: 0

Selected Movie: Barbie
Selected quality => 1080p
[16:03:08] Downloading video ts                                                                          my_m3u8.py:343
Download  93% -------------------------------------------------- --- 1,611/1,726  [ 0:00:00 < 0:00:01 , 8,638 bytes/s ]
[16:03:09] Couldn't find any segments to join, retry                                                     my_m3u8.py:315

or

Select INDEX to download: 0

Selected Movie: Barbie
Selected quality => 1080p
[16:05:29] Downloading video ts                                                                           my_m3u8.py:343
Download  83% ----------------------------------------------- --------- 1,426/1,726  [ 0:00:19 < 0:00:02 , 150 bytes/s ]
[16:05:52] Info 'videos\Movies\barbie\barbie.mp4': 0h 5m 17s                                                  util.py:35
Done!
Quit the script ? [yes / no]:

Is there some verbose output or something I can inspect to give you more informations?

Thanks

Lovi-0 commented 7 months ago

Try the new release, focusing on fixing this problem.

immagine_2024-03-27_162439019

brazoayeye commented 7 months ago

I haven't understood what I have to do. I tried to download ex-novo from git in a new folder and i'm having the same problem:

--->git clone https://github.com/Ghost6446/StreamingCommunity_api/tree/main
Cloning into 'main'...
fatal: repository 'https://github.com/Ghost6446/StreamingCommunity_api/tree/main/' not found

--->git clone https://github.com/Ghost6446/StreamingCommunity_api
Cloning into 'StreamingCommunity_api'...
remote: Enumerating objects: 680, done.
remote: Counting objects: 100% (248/248), done.
remote: Compressing objects: 100% (148/148), done.
Receiving objects:  97% (660/680), 2.50 MiB | 3.47 MiB/sremote: Total 680 (delta 121), reused 190 (delta 84), pack-reused 432
Receiving objects: 100% (680/680), 4.52 MiB | 5.39 MiB/s, done.
Resolving deltas: 100% (332/332), done.

--->cd StreamingCommunity_api

--->python run.py

   _____ _                            _                _____                                      _ _
  / ____| |                          (_)              / ____|                                    (_) |
 | (___ | |_ _ __ ___  __ _ _ __ ___  _ _ __   __ _  | |     ___  _ __ ___  _ __ ___  _   _ _ __  _| |_ _   _
  \___ \| __| '__/ _ \/ _` | '_ ` _ \| | '_ \ / _` | | |    / _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
  ____) | |_| | |  __/ (_| | | | | | | | | | | (_| | | |___| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
 |_____/ \__|_|  \___|\__,_|_| |_| |_|_|_| |_|\__, |  \_____\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
                                               __/ |                                                     __/ |
                                              |___/                                                     |___/

Checking GitHub version ...
=> Everything is up to date
=> You're on Version: v0.9.2

StreamingCommunity_api was downloaded 2 times, but only 2600.0% of You(!!) have starred it.
        Help the repository grow today, by leaving a star and sharing it to others online!

Rules => .forum

Search for any Movie or TV Series title: barbie
0 -> Barbie - movie
1 -> Barbie Dreamhouse Challenge - tv
2 -> Barbie e le 12 principesse danzanti - movie
3 -> Barbie - L'accademia per principesse - movie
4 -> Barbie - Il segreto delle fate - movie
5 -> Barbie e le tre moschettiere - movie
6 -> Barbie - La principessa e la povera - movie
7 -> Barbie e il canto di Natale - movie
8 -> Barbie nel mondo dei videogame - movie
9 -> Barbie Raperonzolo - movie
10 -> Bruno Barbieri - 4 hotel - tv
11 -> La bottega del barbiere - movie
12 -> La bottega del barbiere 2 - movie
13 -> La bottega del barbiere 3 - movie
14 -> Sweeney Todd - Il diabolico barbiere di Fleet Street - movie
15 -> Nope - movie
16 -> Halloween Ends - movie
17 -> ALL IN ONE DAY - movie
18 -> MasterChef Italia - tv
19 -> Celebrity MasterChef Italia - tv
20 -> Euphoria - tv

Total result: 21

Insert INDEX number, or [1-2] for a range of movies/tv series, or [1,3,5] to select discontinued movie/tv series

In case of a TV Series you will also choose seasons and episodes to download

Select INDEX to download: 0

Selected Movie: Barbie
Selected quality => 1080p
[16:31:58] Downloading subtitle: auto                                                                                                                                                                         my_m3u8.py:121
[16:31:59] Downloading subtitle: ita                                                                                                                                                                          my_m3u8.py:121
           Downloading subtitle: eng                                                                                                                                                                          my_m3u8.py:121
[16:32:00] Downloading video ts                                                                                                                                                                               my_m3u8.py:327
Download  81% ------------------------------------------------------------------------------------------------------------------------------- ----------------------------- 1,399/1,726  [ 0:00:08 < 0:00:02 , 262 bytes/s ][16:32:08] Progress reached 99.5%. Stopping.                                                                                                                                                                  my_m3u8.py:260
Download 100% -------------------------------------------------------------------------------------------------------------------------------------------------------------- 1,726/1,726  [ 0:00:20 < 0:00:00 , 92 bytes/s ][16:32:21] No progress for 10 seconds.  Stopping.                                                                                                                                                             my_m3u8.py:283
Download 100% -------------------------------------------------------------------------------------------------------------------------------------------------------------- 1,726/1,726  [ 0:00:20 < 0:00:00 , 92 bytes/s ]
[16:32:25] Info 'videos\Movies\barbie\barbie.mp4': 0h 3m 39s                                                                                                                                                      util.py:35
Done!

File size 44MB.

Tried forcing English language: 71MB

Tried again: 48MB

Should I wait for next release?

Lovi-0 commented 7 months ago

Try pre release.

Lovi-0 commented 7 months ago

I haven't understood what I have to do. I tried to download ex-novo from git in a new folder and i'm having the same problem:

--->git clone https://github.com/Ghost6446/StreamingCommunity_api/tree/main
Cloning into 'main'...
fatal: repository 'https://github.com/Ghost6446/StreamingCommunity_api/tree/main/' not found

--->git clone https://github.com/Ghost6446/StreamingCommunity_api
Cloning into 'StreamingCommunity_api'...
remote: Enumerating objects: 680, done.
remote: Counting objects: 100% (248/248), done.
remote: Compressing objects: 100% (148/148), done.
Receiving objects:  97% (660/680), 2.50 MiB | 3.47 MiB/sremote: Total 680 (delta 121), reused 190 (delta 84), pack-reused 432
Receiving objects: 100% (680/680), 4.52 MiB | 5.39 MiB/s, done.
Resolving deltas: 100% (332/332), done.

--->cd StreamingCommunity_api

--->python run.py

   _____ _                            _                _____                                      _ _
  / ____| |                          (_)              / ____|                                    (_) |
 | (___ | |_ _ __ ___  __ _ _ __ ___  _ _ __   __ _  | |     ___  _ __ ___  _ __ ___  _   _ _ __  _| |_ _   _
  \___ \| __| '__/ _ \/ _` | '_ ` _ \| | '_ \ / _` | | |    / _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
  ____) | |_| | |  __/ (_| | | | | | | | | | | (_| | | |___| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
 |_____/ \__|_|  \___|\__,_|_| |_| |_|_|_| |_|\__, |  \_____\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
                                               __/ |                                                     __/ |
                                              |___/                                                     |___/

Checking GitHub version ...
=> Everything is up to date
=> You're on Version: v0.9.2

StreamingCommunity_api was downloaded 2 times, but only 2600.0% of You(!!) have starred it.
        Help the repository grow today, by leaving a star and sharing it to others online!

Rules => .forum

Search for any Movie or TV Series title: barbie
0 -> Barbie - movie
1 -> Barbie Dreamhouse Challenge - tv
2 -> Barbie e le 12 principesse danzanti - movie
3 -> Barbie - L'accademia per principesse - movie
4 -> Barbie - Il segreto delle fate - movie
5 -> Barbie e le tre moschettiere - movie
6 -> Barbie - La principessa e la povera - movie
7 -> Barbie e il canto di Natale - movie
8 -> Barbie nel mondo dei videogame - movie
9 -> Barbie Raperonzolo - movie
10 -> Bruno Barbieri - 4 hotel - tv
11 -> La bottega del barbiere - movie
12 -> La bottega del barbiere 2 - movie
13 -> La bottega del barbiere 3 - movie
14 -> Sweeney Todd - Il diabolico barbiere di Fleet Street - movie
15 -> Nope - movie
16 -> Halloween Ends - movie
17 -> ALL IN ONE DAY - movie
18 -> MasterChef Italia - tv
19 -> Celebrity MasterChef Italia - tv
20 -> Euphoria - tv

Total result: 21

Insert INDEX number, or [1-2] for a range of movies/tv series, or [1,3,5] to select discontinued movie/tv series

In case of a TV Series you will also choose seasons and episodes to download

Select INDEX to download: 0

Selected Movie: Barbie
Selected quality => 1080p
[16:31:58] Downloading subtitle: auto                                                                                                                                                                         my_m3u8.py:121
[16:31:59] Downloading subtitle: ita                                                                                                                                                                          my_m3u8.py:121
           Downloading subtitle: eng                                                                                                                                                                          my_m3u8.py:121
[16:32:00] Downloading video ts                                                                                                                                                                               my_m3u8.py:327
Download  81% ------------------------------------------------------------------------------------------------------------------------------- ----------------------------- 1,399/1,726  [ 0:00:08 < 0:00:02 , 262 bytes/s ][16:32:08] Progress reached 99.5%. Stopping.                                                                                                                                                                  my_m3u8.py:260
Download 100% -------------------------------------------------------------------------------------------------------------------------------------------------------------- 1,726/1,726  [ 0:00:20 < 0:00:00 , 92 bytes/s ][16:32:21] No progress for 10 seconds.  Stopping.                                                                                                                                                             my_m3u8.py:283
Download 100% -------------------------------------------------------------------------------------------------------------------------------------------------------------- 1,726/1,726  [ 0:00:20 < 0:00:00 , 92 bytes/s ]
[16:32:25] Info 'videos\Movies\barbie\barbie.mp4': 0h 3m 39s                                                                                                                                                      util.py:35
Done!

File size 44MB.

Tried forcing English language: 71MB

Tried again: 48MB

Should I wait for next release?

What is avg of your internet speed and max_workers use ?

brazoayeye commented 7 months ago

How can I get pre release? "max_worker": 20,

Screenshot_1

Lovi-0 commented 7 months ago

https://github.com/Ghost6446/StreamingCommunity_api/releases

Lovi-0 commented 7 months ago

How can I get pre release? "max_worker": 20,

Screenshot_1

not bad.

brazoayeye commented 7 months ago

Sorry for bothering,

I tried pre_release of v0.9.2 (that's only in ax exe file) and I found the same problem. I'll try to investigate it asap

Lovi-0 commented 7 months ago

Ok Idk of that problem.

brazoayeye commented 7 months ago

I modified get_req_ts adding debug infos

   def get_req_ts(self, ts_url):
        """Single req to a ts file to get content"""

        url_number = self.segments.index(ts_url)

        is_valid = True
        for failde_seg in failed_segments:
            if str(failde_seg) in ts_url:
                is_valid = False
                break

        if is_valid:

            try:
                response = requests.get(ts_url, headers={'user-agent': get_headers()}, timeout=10)

                if response.status_code == 200:
                    print(f"OK for {ts_url}")
                    return response.content
                else:
                    print(f"BAD for {ts_url}")
                    failed_segments.append(str(url_number))
                    return None

            except Exception as e:
                print(f"FAILED for {ts_url}: {e}")
                failed_segments.append(str(url_number))
                return None

        else:
            print(f"Not valid for {ts_url}")
            return None

Console log is attached. outLog.txt

Traceback says:

Traceback (most recent call last):
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 737, in _error_catcher
    yield
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 862, in _raw_read
    data = self._fp_read(amt, read1=read1) if not fp_closed else b""
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 845, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
           ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\http\client.py", line 479, in read
    s = self.fp.read(amt)
        ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\ssl.py", line 1252, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\ssl.py", line 1104, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 1043, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 935, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 861, in _raw_read
    with self._error_catcher():
  File "C:\Program Files\Python312\Lib\contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\urllib3\response.py", line 742, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.") from e  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='sc-u4-01.scws-content.net', port=443): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "xxx\StreamingCommunity_api\Src\Lib\FFmpeg\my_m3u8.py", line 218, in get_req_ts
    response = requests.get(ts_url, headers={'user-agent': get_headers()}, timeout=10)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\sessions.py", line 747, in send
    r.content
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\StreamingCommunity_api\venv\Lib\site-packages\requests\models.py", line 822, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='sc-u4-01.scws-content.net', port=443): Read timed out.

I probably have a less stable network since I'm behind many firewall/proxies , but problem is caused by a bad handling of connection errors.

I admit i don't know how streaming works and I haven't studied your code jet, but If i replace the error handling with a simple infinite retry like

            except Exception as e:
                print(f"FAILED for {ts_url}: {e}")
                return self.get_req_ts(ts_url)

I can download the entire film without problems.

You can simulate the problem simply rising a similar exception randomically:

                if random.random() > 0.1:
                    response = requests.get(ts_url, headers={'user-agent': get_headers()}, timeout=10)
                else:
                    raise requests.exceptions.ConnectionError("test")

I'm sorry but I'm worried to make a mess in your code trying to fix it.

Regards

Lovi-0 commented 7 months ago

yes for this debug mode is perfect, i will make it better, your problem is connect to your connection and firewall it say that 90% of ts files are not valid: Not valid for https://sc-u4-01.scws-content.net/hls/100/4/08/4083a566-f546-4609-b12f-cb7f12a6f956/video/1080p/0938-1000.ts, so your pc cant connect to host, i remove infiinite loop to try to donwload because for normal people is not necessary and take to much cpu

Lovi-0 commented 7 months ago

Try with different connection.

brazoayeye commented 7 months ago

Only 10-20% of segments can't be downloaded for timeout, it's the failed_segments.append(str(url_number)) mechanism that discharge most of files because have the same url_number. A simple timeout for a segment doesn't means that all files with same url_number are invalid. As shown isn't even meaning that the same segment can't be downloaded with other retries.

Since for normal connections timeout doesn't occur, it will not increase the cpu time. If errors occur, now film are downloaded corrupted and they're unusable. Imho it's better to always retry all packets, user can stop the download stopping the script if it's taking too long.

For example, if you disconnect from the network for few seconds, films are downloaded incompleted.

With a different and more stable connection there are no problems.

Maybe you can add a parameter to the JSON like connectionErrorRetries = 20 that define how deep you should go in the download retry. If you put it to 0 you don't retry at all, even if imho it's better to stop the script printing the error instead continuing and corrupting files.

Thanks for the work you do

Lovi-0 commented 7 months ago

Yes this is a right solution, that mechanism need to work only if a video ts files isnt download, said to not download the same for audio if present, so at the end there is syncronized video and audio, but idk why is that situation is always enable

Lovi-0 commented 7 months ago

Only 10-20% of segments can't be downloaded for timeout, it's the failed_segments.append(str(url_number)) mechanism that discharge most of files because have the same url_number. A simple timeout for a segment doesn't means that all files with same url_number are invalid. As shown isn't even meaning that the same segment can't be downloaded with other retries.

Since for normal connections timeout doesn't occur, it will not increase the cpu time. If errors occur, now film are downloaded corrupted and they're unusable. Imho it's better to always retry all packets, user can stop the download stopping the script if it's taking too long.

For example, if you disconnect from the network for few seconds, films are downloaded incompleted.

With a different and more stable connection there are no problems.

Maybe you can add a parameter to the JSON like connectionErrorRetries = 20 that define how deep you should go in the download retry. If you put it to 0 you don't retry at all, even if imho it's better to stop the script printing the error instead continuing and corrupting files.

Thanks for the work you do

If you remove that line it work ?

brazoayeye commented 7 months ago

Nope, the file is smaller than the one with the retry. Also the video skips on not downloaded parts

Lovi-0 commented 7 months ago

Try differnt connection