CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
708 stars 422 forks source link

[QUESTION] - Scenarist Closed Caption (SCC) Input support #1293

Open bbgdzxng1 opened 3 years ago

bbgdzxng1 commented 3 years ago

CCExtractor version: 0.88

In raising this issue, I confirm the following:

Additional information

What is the recommended method to input SCC files into ccextractor on a non-Windows platform?

I can see that ccextrator can accept McPoodle Raw format...

Input formats:
       With the exception of McPoodle's raw format, which is just the closed
       caption data with no other info, CCExtractor can usually detect the
       input format correctly.

Therefore '-in=raw' would be the appropriate format.

The challenge is then how to get the SCC files into McPoodle raw format in a non-Windows environment. McPoodle includes SCC2RAW.exe (Windows Binary), but McPoodle's project is only compiled for Windows.

The ccextractor FAQ suggests that the ccextractor team ported McPoodle's code... "Lots of code came originally from McPoodle's tools (even though it was ported from Perl to C)", and ccextractor is thankfully cross-platform. Did the result of this porting process expose an SCC2RAW method in a way that could be used to allow scc as an input into ccextractor?

(or does anyone know of a cross-platform port of McPoodle suite which I may have missed?)

Thanks for a great tool.

bbgdzxng1 commented 3 years ago

Resolved by running SCCTOOL's perl script rather than the Windows binary.

cfsmp3 commented 3 years ago

Let's leave this open if there's a problem with it :-)

bbgdzxng1 commented 3 years ago

Worked perfectly for SCC > Raw > SRT using SCC_TOOLS per scripts on macOS

$ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc
$ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc
$ perl scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin
$ ccextractor -debug -in=raw -out=srt -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.srt
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Files (1): cc_sample_dos2unix.bin
[Extract: 1] [Stream mode: McPoodle's raw]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: Yes] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: cc_sample_dos2unix.bin
Analyzing data in McPoodle raw mode

Total frames time:    00:00:00:000  (0 frames at 29.97fps)

Min PTS:                00:00:00:001
Max PTS:                00:01:09:070
Length:              00:01:09:069
Done, processing time = 0 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
$ head -n20 cc_sample_dos2unix.srt 
1
00:00:09,243 --> 00:00:12,378
                  Is that what  
                  the Americans 
                  call doodling?

2
00:00:12,413 --> 00:00:13,278
It is more serious              

3
00:00:13,314 --> 00:00:16,081
than you could                  
possibly realize                
Charlotte                       

4
00:00:20,221 --> 00:00:21,020
Good

This one should be ok to close, as the above flow worked for SCC > RAW> SRT.

There was an issue for SCC > RAW > VTT, which I have opened under its own ticket, because it is specific to VTT. https://github.com/CCExtractor/ccextractor/issues/1294

Thanks!