CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
707 stars 422 forks source link

[BUG] Failing to extract DVB subtitles from live stream (Failed to perform OCR) #1010

Open jakubvojacek opened 5 years ago

jakubvojacek commented 5 years ago

CCExtractor version (using the --version parameter preferably) : 0.87

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

Necessary information

Video links (replace text below with your links) tnt.ts - https://goo.gl/r4WXto

Additional information Interestingly, when running ccextractor on the file (ccextractor tnt.ts), it does produce a tnt.srt file with correct subtitles in it. However, it does print a whole bunch of errors.

But when the tnt.ts is being played out in a loop (for example tsplay tnt.ts 239.1.2.3:1234 -loop), ccextractor fails eventually (the time before it fails varies in seconds to a minute usually)

root@jones:~/tnt# ccextractor   -udp 239.1.2.3:1234
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Network, 239.1.2.3:1234
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

----------------------------------------------------------------------
Reading from UDP socket 239.1.2.3:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 11/6
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=3249, segment_length=3490
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Can you please look into what is wrong?

Thank you Jakub

cfsmp3 commented 4 years ago

@jakubvojacek Is this still a problem in current master?

jakubvojacek commented 4 years ago

Hello @cfsmp3

I just tested with the current master (5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6) and it's still happening, it's reproducible on a static file now too. If you download https://goo.gl/r4WXto and try to play in VLC and enable Portugesse DVB subtitles, there will be subtitles visible. While trying with ccextractor (plain ccextractor tnt.ts), it will throw the same errors as described above. I have attached the console output below.

root@ts:/opt/ccextractor# git rev-parse HEAD
5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6

root@ts:/opt/ccextractor# build/ccextractor /data/tnt.ts
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: /data/tnt.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: /data/tnt.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 10/14
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=2917, segment_length=3462
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
jstrot commented 2 months ago

I'm having a similar issue:

$ ccextractor --output-field 1 --cc2 --out=srt --utf8 movie.vob -o subtitle.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: movie.vob
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: movie.vob
File seems to be a program stream, enabling PS mode
Analyzing data in general mode

New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]

Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

I'm not familiar with the exact content of the vob file I'm working with. Could be there is no actual CC encoded at all, could be corrupted too, but mediainfo seems to think there is a CC3 (hence my using --output-field 1 --cc2):

Text
ID                                       : 224 (0xE0)-CC3
Format                                   : EIA-608
Muxing mode, more info                   : Muxed in Video #1
Duration                                 : 2 min 58 s
Start time (commands)                    : 1 s 248 ms
Start time                               : 2 s 183 ms
Bit rate mode                            : Constant
Stream size                              : 0.00 Byte (0%)
Count of frames before first event       : 58
Type of the first event                  : PopOn