Comcast / caption-inspector

Caption Inspector is a reference decoder for Closed Captions (CEA-608 and CEA-708).
https://comcast.github.io/caption-inspector/
Apache License 2.0
69 stars 11 forks source link

Missing hex and decoded data in `.ccd` output for old broadcast recording #11

Open micolous opened 2 years ago

micolous commented 2 years ago

I attempted to run caption-inspector on an old US TV broadcast with CEA-608 captions, reproduced from hls.js demo page: https://playertest.longtailvideo.com/adaptive/captions/playlist.m3u8

I downloaded the recording using youtube-dl, and have attached it to this issue (in a ZIP file so GitHub doesn't try to transcode it): cnn-live.mp4.zip

I was able to play back the downloaded file with captions fine in VLC:

vlcsnap-2021-09-20-11h20m57s499

I ran caption-inspector on Ubuntu 20.04 at commit 476326f08a43ce38ecd1ea58b8910d7015e80cac, and patched the Makefile to build with gcc 9.3 rather than clang (Issue #13).

I then tried to extract the CEA-608 tracks with:

mkdir /tmp/cnn
./caption-inspector -o /tmp/cnn cnn-live.mp4
cd /tmp/cnn
zip -9 cnn.zip cnn-live*

All outputs I got are as attached: cnn.zip

I got a correct-looking cnn-live-C1.608 with captions from the program:

00:00:00,755 - {RCL} {ENM} {ENM} {R1:C4} {R1:C4} {TO2} {TO2} "BUT HE HAD PURINA CAT CHOW" {R2:C16} {R2:C16} {TO3} {TO3} "INDOOR."
00:00:01,930 - {EOC}

However, cnn-live.ccd appears to have timestamps and fully-decoded data, but appears to be missing "hex data" and "decoded data":

00:00:01,049  
TEXT: Ch1 - "BU" 

00:00:01,091  
TEXT: Ch1 - "T " 

00:00:01,133  
TEXT: Ch1 - "HE" 

00:00:01,175  
TEXT: Ch1 - " H" 

00:00:01,217  
TEXT: Ch1 - "AD" 

I was able to run caption-inspector against a different US broadcast capture which is a little more modern (720p59.94 with CEA-608 and 708 captions) and files created with libcaption's flv+srt tool (which produces possibly-not-quite-valid CEA-608 captions), and I got proper "hex data" and "decoded data":

00:00:01,936  F1:5468  PS:4322  PD:5468  PD:0000  XD:0000    Ch1: "Th"  <-Srvc:01  G0:T|G0:h  ?00?|?00?  _________    Chan-1:  "T"  "h"  <--Seq:1 P006-B02  G0Svc:01|G0Svc:01  ???-0x00|???-0x00  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  
TEXT: Ch1 - "Th" Svc1 - "Th" 

00:00:01,952  F2:8080  XD:0000  XD:0000  XD:0000  XD:0000    F2 - NULL  _________  _________  _________  _________    608: Field 2 NULL  _________________  _________________  _________________  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  

00:00:01,969  F1:E5F2  PS:8322  PD:6572  PD:0000  XD:0000    Ch1: "er"  <-Srvc:01  G0:e|G0:r  ?00?|?00?  _________    Chan-1:  "e"  "r"  <--Seq:2 P006-B02  G0Svc:01|G0Svc:01  ???-0x00|???-0x00  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  
TEXT: Ch1 - "er" Svc1 - "er" 
00:28:17,000  F1:94AE  F1:9420  F1:9140  F1:C7F2  F1:E561    Ch1 {ENM}  Ch1 {RCL}  Ch1 - PAC  Ch1: "Gr"  Ch1: "ea"    Erase NonDisp Mem  ResumeCaptLoading  _Row:01 -  White_  Chan-1:  "G"  "r"  Chan-1:  "e"  "a"  
              F1:F420  F1:F7EF  F1:F26B  F1:AE80  F1:91E0    Ch1: "t "  Ch1: "wo"  Ch1: "rk"  Ch1 - "."  Ch1 - PAC    Chan-1:  "t"  " "  Chan-1:  "w"  "o"  Chan-1:  "r"  "k"  Channel - 1:  "."  _Row:02 -  White_  
              F1:5B4C  F1:6175  F1:6768  F1:F4E5  F1:F25D    Ch1: "[L"  Ch1: "au"  Ch1: "gh"  Ch1: "te"  Ch1: "r]"    Chan-1:  "["  "L"  Chan-1:  "a"  "u"  Chan-1:  "g"  "h"  Chan-1:  "t"  "e"  Chan-1:  "r"  "]"  
TEXT: Ch1 - "Great work.[Laughter]." 
micolous commented 2 years ago

I'm pretty sure that the issue is triggered by the source file having cc_count < 5. Caption Inspector only tries to print anything if there are at least 5 blocks:

https://github.com/Comcast/caption-inspector/blob/476326f08a43ce38ecd1ea58b8910d7015e80cac/src/sink/cc_data_output.c#L257-L274

This then trips an assert later on:

https://github.com/Comcast/caption-inspector/blob/476326f08a43ce38ecd1ea58b8910d7015e80cac/src/sink/cc_data_output.c#L305