Freeview UK - no subtitles detected

akiller commented 7 years ago

Hi all,

I'm trying to extract subtitles from Freeview recordings but no matter what options I choose it never seems to detect anything. I've tried different recordings and channels and used both your latest release version as well as compiling the latest code myself to no avail.

Here's a sample file (~12MB) recorded on a Hauppauge Nova-T https://www.dropbox.com/s/8ir9ofo03zir90h/8ooTC.zip?dl=1

And here's a log when I tried to process it: C:\Temp\ccextractor\ccextractor-master\windows\Release-Full\ccextractorwinfull.exe --gui_mode_reports -haup -autoprogram -out=srt -bom -latin1 [+input files]

CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc

Input: \serv\videos\TV\8 Out of 10 Cats Does Countdown\8ooTC.ts [Extract: 1] [Stream mode: Autodetect] [Program : Auto ] [Hauppage mode: Yes] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None]

Opening file: \serv\videos\TV\8 Out of 10 Cats Does Countdown\8ooTC.ts

File seems to be a transport stream, enabling TS mode

Analyzing data in general mode Creating \serv\videos\TV\8 Out of 10 Cats Does Countdown\8ooTC.srt

Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0

Min PTS: 00:00:00:246 Max PTS: 00:00:44:219 Length: 00:00:43:973

Done, processing time = 0 seconds

No captions were found in input. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues

Any help would be appreciated - cheers :).

Izaron commented 7 years ago

Without -haup I get (well, on linux) this output - link Can you uncheck "Hauppage" switch button and try one time more?

akiller commented 7 years ago

That's interesting that it works for you. I wonder if it's a Windows issue? Without using Haup I get the same with a 0kb srt:

CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc

Input: C:\Temp\8ooTC.ts [Extract: 1] [Stream mode: Transport] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None]

Opening file: C:\Temp\8ooTC.ts

Analyzing data in general mode Creating C:\Temp\8ooTC.srt

Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0

Min PTS: 00:00:00:246 Max PTS: 00:00:44:219 Length: 00:00:43:973

Done, processing time = 1 seconds

No captions were found in input. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues

cfsmp3 commented 7 years ago

Are you using the full version (which includes the OCR), not the compact one?

On Sun, Feb 19, 2017 at 11:45 AM, Andrew Killer notifications@github.com wrote:

That's interesting that it works for you.

I wonder if it's a Windows issue? Without using Haup I get the same with a 0kb SRT:

CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc

Input: C:\Temp\8ooTC.ts [Extract: 1] [Stream mode: Transport] [Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto] [Timing mode: Auto] [Debug: No] [Buffer input: Yes] [Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No] [Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No] [Add font color data: Yes] [Add font typesetting: Yes] [Convert case: No] [Video-edit join: No] [Extraction start time: not set (from start)] [Extraction end time: not set (to end)] [Live stream: No] [Clock frequency: 90000] [Teletext page: Autodetect] [Start credits text: None]

Opening file: C:\Temp\8ooTC.ts

Analyzing data in general mode Creating C:\Temp\8ooTC.srt

Number of NAL_type_7: 0 Number of VCL_HRD: 0 Number of NAL HRD: 0 Number of jump-in-frames: 0 Number of num_unexpected_sei_length: 0

Min PTS: 00:00:00:246 Max PTS: 00:00:44:219 Length: 00:00:43:973

Done, processing time = 1 seconds

No captions were found in input. Issues? Open a ticket here https://github.com/CCExtractor/ccextractor/issues

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/691#issuecomment-280943176, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2cVSJaqz7cxa_zuEarCzkxh14irtks5reJvSgaJpZM4MEslM .

akiller commented 7 years ago

Hi Carlos,

I was using the full version. I was also using the GUI. When I ran ccextractorwinfull.exe manually I noticed some output which wasn't in the output from the GUI:

Opening file: 8ooTC.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode Error opening data file \temp\cce\tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Creating 8ooTC.srt

Downloading eng.traineddata from https://github.com/tesseract-ocr/tessdata and putting it in tessdata/eng.traineddata now seems to have solved the problem for both command line and the GUI.

It may be worth updating the GUI to make this clear?

Thanks for your help.

CCExtractor / ccextractor

Freeview UK - no subtitles detected #691

CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc

CCExtractor 0.85, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc