CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
712 stars 424 forks source link

[BUG] OCR works only for first DVB subtitle stream (OCR context is not shared) #1067

Open nikop opened 5 years ago

nikop commented 5 years ago

CCExtractor version (using the --version parameter preferably) : 0.87

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

Necessary information

Video links (replace text below with your links)

https://madjoki.com/ts/test.ts

Additional information

Following works, but will extract first subtitle (0xCDF). This is expected result.

-out=srt -bom -latin1 -codec dvbsub "test.ts" -ocrlang fin
-out=srt -bom -latin1 -codec dvbsub "test.ts" -datapid 0xCDF -ocrlang fin

Cause seems to be:

https://github.com/CCExtractor/ccextractor/blob/dac9de4d67523e60ed07ee0e868195f90827acd3/src/lib_ccx/ts_tables.c#L358

It will work by commenting these two lines. It seems to be intent to share OCR between decoders, but this is not done.

Which means: https://github.com/CCExtractor/ccextractor/blob/dac9de4d67523e60ed07ee0e868195f90827acd3/src/lib_ccx/dvb_subtitle_decoder.c#L1664

will become false and skip OCR

Murmur commented 4 years ago

@nikop Do we have the same issue? This is my ticket for not able to extract second dvbsub text but png image extract works fine for all tracks. https://github.com/CCExtractor/ccextractor/issues/1163

nikop commented 4 years ago

@Murmur Yes, this seems to be same issue.

nikop commented 4 years ago

@Murmur

If you want to try and compile yourself:

ts_tables.diff.txt

cfsmp3 commented 4 years ago

@nikop is it still happening in current master? We've done a lot of work in the past weeks and I'm going over all the issues - cleaning up. Thanks.

mfarberbrodsky commented 4 years ago

Hi, I just checked it with the current master, still same result (no captions produced with -datapid 0xCE0). Same problem as #1163

mfarberbrodsky commented 4 years ago

It still does work by commenting the two lines nikop suggested:

if (!pinfo->initialized_ocr)
    pinfo->initialized_ocr = 1;

What's their purpose? Everything seems to be working without them.

cfsmp3 commented 4 years ago

@mfarberbrodsky It declares the OCR "initialized" if it wasn't. I don't think however that the problem is there but rather that some other place must be checking that variable and only do something is the ocr is not initialized.

Once you've gotten that far I'd say it can't be too hard to fix.

mfarberbrodsky commented 4 years ago

@cfsmp3 I investigated this problem a bit more, and I think I found the root of the issue. It starts here. On line 357, ocr_ctx is initiated only once, when pinfo->initialized_ocr is still 0. It is then stored in the returned ptr. That ptr is written to ctx in update_capinfo, and then it is actually stored as codec_private_data only in that specific pid (you can see that here) - which is the first pid that contains caption data, since ocr_ctx is initiated once. All the other pids won't have ocr_ctx, and this is why no captions are produced. I believe this is why the problem occurs, however I'm not sure yet what solution I can implement.

cfsmp3 commented 4 years ago

@mfarberbrodsky That's a good investigation, good job :-)

hamelg commented 4 years ago

Hi, I have the same issue. I tried the workaround : https://github.com/CCExtractor/ccextractor/issues/1067#issuecomment-578506743 Unfortunately, it causes an other issue : the subtitle timestamps are wrong, the first offset is null.

hamelg commented 4 years ago

No, the workaround works fine. My timestamp issue was related to my ts file.

rboy1 commented 3 years ago

Has there been any workaround or solution for this, I'm seeing the same issue here:

Text #1
ID                                       : 1024 (0x400)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 32 s 800 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : German

Text #2
ID                                       : 1025 (0x401)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 35 s 760 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : esp

Text #3
ID                                       : 1026 (0x402)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 38 s 520 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : French

Text #4
ID                                       : 1027 (0x403)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 35 s 760 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : Italian
ccextractorwinfull.exe -datapid 1027 DVBSubtitles.ts
CCExtractor 0.89, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: DVBSubtitles.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: DVBSubtitles.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
100%  |  00:45
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:                                00:00:00:576
Max PTS:                                00:00:46:336
Length:                          00:00:45:760
Done, processing time = 2 seconds

No captions were found in input.
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
cfsmp3 commented 3 years ago

Has there been any workaround or solution for this, I'm seeing the same issue here:

You answered yourself :-)

cfsmp3 commented 1 year ago

@nikop the file is not available, do you have it somewhere?

nikop commented 1 year ago

@cfsmp3 I reuploaded the file

cfsmp3 commented 1 year ago

@cfsmp3 I reuploaded the file

Thanks. Please leave it there until this is fixed :-)

Your original post points to code, but it uses master instead of a specific commit, so the lines you point at doesn't seem to match current master. Can you update that?