dxing97 / subs2cia

Condensed Immersive Audiovisual media generator from subtitles for language learning
MIT License
89 stars 10 forks source link

Failed to condense Episode 5 of Mother from Viki #12

Closed kytrinyx closed 3 years ago

kytrinyx commented 3 years ago

I downloaded all 16 episodes of Mother from Viki using youtube-dl, along with the English subtitle files (since they are the most complete).

I was able to condense fifteen of the episodes, but episode 5 threw an error. I've included the video and vtt file in the zipped directory here: https://www.dropbox.com/s/lbboh0w94hb22hf/subs2cia-bug-report.zip?dl=0

I don't use the batch functionality, I pass the video and vtt file paths explicitly each time. So for episode 5 I ran the following command:

subs2cia condense -t 1500 -p 100 -i mother/mother.ep-05.1125084v.mp4 mother/subtitles/mother.ep-05.1125084v.en.vtt

The output I got was:

Traceback (most recent call last):
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/bin/subs2cia", line 8, in <module>
    sys.exit(main())
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/cli.py", line 4, in main
    subs2cia.main.start()
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/main.py", line 150, in start
    commands[args['command']](args, groups)
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/main.py", line 60, in condense_start
    c.choose_streams()
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/Common.py", line 272, in choose_streams
    self.choose_subtitle(interactive=self.interactive)
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/condense.py", line 107, in choose_subtitle
    subdata.load(include_all=self.use_all_subs, regex=self.subtitle_regex_filter)
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/subtools.py", line 187, in load
    self.ssadata = ps2.load(str(self.subpath))
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/ssafile.py", line 102, in load
    return cls.from_file(fp, format_, fps=fps, **kwargs)
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/ssafile.py", line 155, in from_file
    format_ = autodetect_format(fragment)
  File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/formats.py", line 80, in autodetect_format
    raise FormatAutodetectionError("Multiple suitable formats (%r)" % formats)
pysubs2.exceptions.FormatAutodetectionError: Multiple suitable formats ({'vtt', 'tmp'})

I ran the command again with the verbose flag enabled, and have included the output below.

Verbose output ``` $ subs2cia condense -v -t 1500 -p 100 -i mother/mother.ep-05.1125084v.mp4 mother/subtitles/mother.ep-05.1125084v.en.vtt INFO:root:subs2cia version v0.3.2 INFO:root:Have 1 group(s) to process. INFO:root:Mapping input file(s) [AVSFile(filepath=PosixPath('mother/mother.ep-05.1125084v.mp4')), AVSFile(filepath=PosixPath('mother/subtitles/mother.ep-05.1125084v.en.vtt'))] to one output file INFO:root:Found 1 video input streams INFO:root:Found 1 audio input streams INFO:root:Found 1 subtitle input streams ffmpeg version 4.3.2 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 12.0.0 (clang-1200.0.32.29) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.2_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox libavutil 56. 51.100 / 56. 51.100 libavcodec 58. 91.100 / 58. 91.100 libavformat 58. 45.100 / 58. 45.100 libavdevice 58. 10.100 / 58. 10.100 libavfilter 7. 85.100 / 7. 85.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 7.100 / 5. 7.100 libswresample 3. 7.100 / 3. 7.100 libpostproc 55. 7.100 / 55. 7.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'mother/mother.ep-05.1125084v.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Duration: 01:01:42.51, start: 0.000000, bitrate: 509 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 852x480 [SAR 640:639 DAR 16:9], 312 kb/s, 24 fps, 24 tbr, 12288 tbn, 48 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 192 kb/s (default) Metadata: handler_name : SoundHandler Stream mapping: Stream #0:1 -> #0:0 (aac (native) -> flac (native)) Press [q] to stop, [?] for help [flac @ 0x7ffdce00ce00] encoding as 24 bits-per-sample Output #0, flac, to 'mother/mother.ep-05.1125084v.mp4.stream1.audio.und.flac': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Stream #0:0(und): Audio: flac, 48000 Hz, stereo, s32 (24 bit), 128 kb/s (default) Metadata: handler_name : SoundHandler encoder : Lavc58.91.100 flac size= 510868kB time=01:01:42.50 bitrate=1130.3kbits/s speed= 325x video:0kB audio:510860kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001599% Traceback (most recent call last): File "/Users/kytrinyx/.pyenv/versions/3.9.0/bin/subs2cia", line 8, in sys.exit(main()) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/cli.py", line 4, in main subs2cia.main.start() File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/main.py", line 150, in start commands[args['command']](args, groups) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/main.py", line 60, in condense_start c.choose_streams() File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/Common.py", line 272, in choose_streams self.choose_subtitle(interactive=self.interactive) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/condense.py", line 107, in choose_subtitle subdata.load(include_all=self.use_all_subs, regex=self.subtitle_regex_filter) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/subs2cia/subtools.py", line 187, in load self.ssadata = ps2.load(str(self.subpath)) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/ssafile.py", line 102, in load return cls.from_file(fp, format_, fps=fps, **kwargs) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/ssafile.py", line 155, in from_file format_ = autodetect_format(fragment) File "/Users/kytrinyx/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pysubs2/formats.py", line 80, in autodetect_format raise FormatAutodetectionError("Multiple suitable formats (%r)" % formats) pysubs2.exceptions.FormatAutodetectionError: Multiple suitable formats ({'tmp', 'vtt'}) ```

I'm not sure where it is getting the 'tmp' format from. As far as I can tell there are no files with a tmp extension in the directory where I have the subtitle files.

If you have any idea what else I could/should try, or if there's anything else I can do to make debugging easier, please let me know.

kytrinyx commented 3 years ago

I tried doing a little bit of preliminary debugging.

The library that is doing format detection is throwing the Multiple suitable formats error. The library has defined .txt as the extension for the tmp format. I don't have any .txt files in the directory where I have the subtitle files.

I ran a --list-streams for the same command to see if it was picking up some extra file somewhere, but it doesn't look like it:

Listing streams found in /Users/kytrinyx/subs2cia/mother/mother.ep-05.1125084v.mp4 (video), /Users/kytrinyx/subs2cia/mother/subtitles/mother.ep-05.1125084v.en.vtt (subtitle)
Available subtitle streams:
Stream   0: codec: webvtt, [/Users/kytrinyx/subs2cia/mother/subtitles/mother.ep-05.1125084v.en.vtt]

Available audio streams:
Stream   0: codec: aac, lang_code: und, [/Users/kytrinyx/subs2cia/mother/mother.ep-05.1125084v.mp4]

Available video streams:
Stream   0: codec: h264, lang_code: und, [/Users/kytrinyx/subs2cia/mother/mother.ep-05.1125084v.mp4]

Available chapters:

With debug logging enabled, I get the following right before the traceback:

DEBUG:root:ffmpeg probe results: {'streams': [{'index': 0, 'codec_name': 'webvtt', 'codec_long_name': 'WebVTT subtitle', 'codec_type': 'subtitle', 'codec_time_base': '0/1', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'r_frame_rate': '0/0', 'avg_frame_rate': '0/0', 'time_base': '1/1000', 'disposition': {'default': 0, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}}], 'chapters': [], 'format': {'filename': '/Users/kytrinyx/subs2cia/mother/subtitles/mother.ep-05.1125084v.en.vtt', 'nb_streams': 1, 'nb_programs': 0, 'format_name': 'webvtt', 'format_long_name': 'WebVTT subtitle', 'size': '41522', 'probe_score': 100}}
DEBUG:root:Loading subtitles at /Users/kytrinyx/subs2cia/mother/subtitles/mother.ep-05.1125084v.en.vtt
Traceback (most recent call last):
...
dxing97 commented 3 years ago

Looks like pysubs2 didn't have explicit support for VTT subtitles until recently (see https://github.com/tkarabela/pysubs2/issues/30) so I'm updating subs2cia to require v1.1.0 of pysubs2 or later.

I'm able to reproduce the exception with pysubs2 v1.10. It looks like pysubs2's format autodetection is getting confused with this particular file and doesn't know which format to use. I've added some logic that retries by forcing the subtitle file's extension as the format and that seems to work for your file.

kytrinyx commented 3 years ago

Sweet, thank you! I'll upgrade and try again.