CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
722 stars 427 forks source link

[BUG] Two output files produced when using sects option and ttxt output format #1448

Open dcwar opened 2 years ago

dcwar commented 2 years ago

CCExtractor version: {0.94}

In raising this issue, I confirm the following:

Necessary information

Video links

Additional information

When making timed text transcripts with the -sects option from ATSC 1.0 broadcasts, CCextractor will not only produce an OUTPUT.txt file, it will also produce an OUTPUT.pX.svcYY file, where X and YY are numbers derived from the input file. That second transcript contains the same source information, but expressed in a different output format / presentation.

In this respect, CCextractor's unexpected production of a second, non-specified file has changed between 0.88 and 0.94, and seems neither expected or correct.

PunitLodha commented 2 years ago

The file with .pX.svcYY extension is for CEA 708 subtitles, and the other one is for CEA 608 subtitles. Earlier, 708 subs were extracted only if the -svc flag was passed. In 0.94, this behavior was changed and both 708 and 608 subs are extracted by default, as mentioned in the change log here, https://github.com/CCExtractor/ccextractor/blob/master/docs/CHANGES.TXT#L14 Though I can understand, the changelog isn't clear that 2 files will be produced by default.

PunitLodha commented 2 years ago

This is how it works now,

I'll add this information maybe in the changelog, or the ccextractor --help command

dcwar commented 2 years ago

Thank you for the clarifications about the changes in ccextractor's operation. I greatly appreciate it.

Since extracting both 608 and 708 by default is a fundamental change to how ccextractor operates, and is completely unexpected behavior when compared to the operation of previous versions, it would seem wise to add this information to the usage information (both in the --help output, and elsewhere). The single line in the changelog doesn't really address the full extent and magnitude of this set of changes.