Closed cfsmp3 closed 5 years ago
Hey, the codebase is large so I am not yet able to pinpoint the error. But the error follows a pattern. In each output two characters move after four places. eg: In the Outrageous Acts of Science
THE SWINNG H[GI]AMMOCK -> THE SWIN[GI]{NG H}AMMOCK IS ACTUALLY VERYIMIL[ S]AR -> IS ACTUALLY VERY[ S]{IMIL}AR and so on..... Same could be observed in all the samples. Hope this helps.
I did regression test, this error is from 0.71, from starting
@anshul1912 I ran the samples with v0.71 and v0.70. Both of them gives garbled output.
P.S. - (being a beginner) by regression test, you mean that this feature was fine before and broke in v0.71 ?
@mailumangjain thats what I was trying to say, it is broken from start
So we have a new format which CCextractor does not support ? then what is the methodology to support this ?
these files are supported, but there is some bug in code, which need to be taken care. That format is supported but these files have something different in it, which ends up things jumbled. If file is not supported then it would and should say its not supported
Any suggestions what might be? Because I've gone nuts tracking down this bug, I see the buffer, it contains garbled output, but from where it is creeping is not what i am getting.
I just tried running the sample with v0.69. The output is still garbled.
On Sun, Mar 8, 2015 at 2:45 PM, Umang Jain notifications@github.com wrote:
Any suggestions what might be? Because I've gone nuts tracking down this bug, I see the buffer, it contains garbled output, but from where it is creeping is not what i am getting.
— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-77740369 .
Can there actually be a problem in the ccx_decoders? They are the one's filling the buffers right....
I can confirm that I have witnessed this issue with CCextractor many times with that exact same pattern. I am just a user, however, so I cannot do much more other than share my user experience with this app and confirm that I am experiencing it too. Hopefully this is helpful in some small way.
On Mar 7, 2015, at 7:14 AM, Nurendra Choudhary notifications@github.com wrote:
Hey, the codebase is large so I am not yet able to pinpoint the error. But the error follows a pattern. In each output two characters move after four places. eg: In the Outrageous Acts of Science
THE SWINNG H[GI]AMMOCK -> THE SWIN[GI]{NG H}AMMOCK IS ACTUALLY VERYIMIL[ S]AR -> IS ACTUALLY VERY[ S]{IMIL}AR and so on..... Hope this helps.
— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-77693192.
@dwhe It would be even better if you could share some samples, or provide some more details (like input format, used parameters, etc.) :)
Hey! I don't get it: as far as I can see, the input is garbled (and sure enough the output is equally garbled). I delete .srt, open a video file in vlc, select subtitle track = closed captions 1 and see the exact same mutilated captions. Is there something I don't understand everyone else here does? Or I just shouldn't trust vlc either because it has same bugs?)
VLC is not always correct either ;)
A sample can be indeed corrupt sometimes, but it can also happen that both VLC (for example) and CCExtractor have a bug in the processing of the code.
To determine if the caption data (input) is good, it might be interesting to analyze it deep-level, and compare it to what the specs are definining as correct order/behaviour.
OK, thank you. Are there any useful tools? For example are there available software CC encoders? Or just the ccextractor's decoders themselves and the specs?)
I can describe the input - I download the video from my Tivo units (I have a Series3 and a Premiere units), and convert them either using KMTTG or or cTivo, both of which processes involve the use of ccextractor. The garbling described earlier occurs with both Tivo units and both KMTTG and cTivo.
However, until this thread was created, I always thought that the garbling was the fault of the transfer process that occurred between the Tivo units to my mac - either caused by the Tivo OS or by the two software apps (KMTTH & cTivo) that downloaded the videos from the Tivo units. I thought I would pipe in because the garbling description is exactly what I have been experiencing.
It is very possible that the garbling is not caused by ccextractor but I found it interesting that my experience is identical to what was described. It is also possible that the identical garbling just happens to be identical and not necessarily related to OP’s issue
Do let me know if you’d like any additional details from me.
On Mar 20, 2015, at 6:45 PM, Willem notifications@github.com wrote:
@dwhe https://github.com/dwhe It would be even better if you could share some samples, or provide with details (like input format, used parameters, etc.) :)
— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-84217286.
@dwhe is correct. The TiVo is the problem.
I've done more tests. The TiVo can trasfer either PS or TS. One is faster, one is "more reliable". I've always done PS because it worked more consistently for me. Today I transferred a file as PS like I normally do, and got garbled captions. Transfer the same file as TS and the captions are OK. (Sadly, I have many dozens of archived files already transferred as PS. Ah well.)
If whatever garbling has happened to the input can still be handled correctly, I'd be a very happy user! But right now this looks like a GIGO issue.
If Tivo is able to play the PS and display captions correctly then it follows that it's possible to extract correct CC from a Tivo PS.
The TS and PS code in CCExtractor is different, so correct TS and broken PS might be an issue in CCExtractor.
It would be helpful to get a PS and a TS for the same recording - that should help us figure out what's different.
I've added four short clips to the original URL linked above, both in PS and TS format, with those indicators in the names. The extracted captions are all OK for the TS format files, and all exhibit the described arbitrarily missing/transposed two character issues. (Actually, this time it looks like they're most/all missing, not really ever transposed.)
I've checked all PS files. I was searching for 0xB24741393403 bytes which means the beginning of caption blocks (picture user data with CC according to A/53 Part 4, chapter 6.2.2). There were no other blocks with CC.
In files Deadliest.. , Dirty.., NASA.., Redrum CC blocks with text were missing in the places where they must be. So, the problem is likely to be with files generated by Tivo. By the way, when I convert TS files to PS using VLC captions are fine.
In files Family.., How.., Outran.. a lot of CC are in the wrong order, and only a few CC blocks with text are missing. So, maybe control blocks are missing or something or the problem is with CCExtractor. I need to do more investigations.
test.mpg works fine with "-2" parameter.
Files Family.., How.., Outran.. are quite odd. For some reason there is a wrong order of pictures at the end of picture sequence.
For example, we have (the first column is the file offset):
91A579 SEQ
91A58F GOP drop:0 time:0:0:0:0 closed:1 broken:0
91A597 PIC tref:0 type:I
91A5A8 CC v:1 t:0 [11;52]
957DF2 PIC tref:3 type:P
957E8A CC v:1 t:0 [20;41] < ;A>
962AEB PIC tref:1 type:B
962B83 CC v:1 t:0 [11;52]
968E40 PIC tref:2 type:B
968ED8 CC v:1 t:0 [49;53] <I;S>
96F779 PIC tref:6 type:P
96F811 CC v:1 t:0 [4C;4C] <L;L>
97830A PIC tref:4 type:B
97831C CC v:1 t:0 [43;54] <C;T>
97F997 PIC tref:5 type:B
97F9A9 CC v:1 t:0 [55;41] <U;A>
987930 PIC tref:9 type:P
9879C8 CC v:1 t:0 [52;59] <R;Y>
990909 PIC tref:7 type:B
9909A1 CC v:1 t:0 [59;20] <Y; >
9973DE PIC tref:8 type:B
9973F0 CC v:1 t:0 [56;45] <V;E>
99DC67 PIC tref:12 type:P
99DCFF CC v:1 t:0 [20;53] < ;S>
9A7F30 PIC tref:10 type:B
9A7F42 CC v:1 t:0 [49;4D] <I;M>
9B0C2D PIC tref:11 type:B
9B0C3F CC v:1 t:0 [49;4C] <I;L>
After sorting by temporal reference it yields: "IS ACTUALLY VERYIMIL S". which is exactly the output of CCExtractor. But, if we shift the last picture by 2 positions, i.e. ..., 8, 9, 12, 10, 11 it'll yield the correct output (IS ACTUALLY VERY SIMIL).
The same happens throughout all 3 files. I didn't find any description for that in specs (maybe I've missed something). Any ideas?
My only suggestion -for now- is to get more samples and see if it happens with all or just some of them .
Maybe the TiVo owners in this thread can supply us with more? Also both the PS and TS versions so we can reach some conclusion by comparing.
And -since I'm asking for stuff- URLs to download the program that does that conversion :-)
I grabbed five more arbitrary recordings, each right about one minute a piece, each in both TS and PS format. This was done by navigating to the TiVo's web interface and following the download links.
For each PS I ran "tivodecode" ( https://github.com/arantius/tivodecode ) to produce the decrypted mpeg. For each TS I passed it through "DirectShow Dump" (paired with TiVo Desktop; both included in the folder linked below). This is because tivodecode is much easier to use, but unable to process TS files. Unfortunately I can't share the decryption key so that's not of much use to you.
Here's the resulting files: https://drive.google.com/open?id=0B3bPKNXgZu0-Ri1MZ0lyMllzSzg
In these cases the PS files have obvious issues; the TS files seem good as well ... Until you get to sample E. In this case the TS is completely empty, and the PS still contains obvious issues ("TO PREVENT THE INGDIENRETS", "AFTER EA ICHNGREDIENT", "THE TANK'S CONNTTES", "QUALITY-CONTROL STINTEG.", "THE THICKNESS OF THEAINT P." In this case VLC is able to play e.ts
and display all the subtitles correctly without error (though it does have issues with e.ps
, worse than ccextractor does).
I hope this helps. Also: the files I actually care about all happen to come from the channel that sample E came from.
Thanks for the samples.
I've done the same. There are caption blocks missing in all PS files. But in B and D, as I described earlier, after shifting the last picture header and corresponding CC block by 2 positions the output is fine. In file E, sometimes the offset should be 1 position.
Also I found that in files a.ps.tivo, b.ps, and especially in e.ps.tivo captions blocks are present and in the correct order, but they are not displayed by CCExtractor. These blocks are also at the end of GOP. Maybe it's somehow related to the previous problem or maybe CCExtractor doesn't flush caption buffer.
Files .ts.Tivo doesn't work at all. I can't play them in VLC either. But they have caption blocks :)
Also I found that in files a.ps.tivo, b.ps, and especially in e.ps.tivo captions blocks are present and in the correct order, but they are not displayed by CCExtractor.
Nope, in these cases the last CC block contains PAC which sets cursor position. As it's misplaced, new captions overwrite the previous ones, so they are not displayed.
So, these errors follows the same pattern. Either it's defined in specs or the bug is somewhere outside of CCExtractor. @arantius Did you write tivodecode?
No, I did not write tivodecode.
@rkuchumov Original project is here: https://sourceforge.net/projects/tivodecode/
Seems to be abandoned.
Error is still reproducable with outrageous acts of science as of V 0.82. Same pattern occurs.
There is a TS compatible next-generation version of it here: https://github.com/wmcbrine/tivodecode-ng Despite the warning comment; it works great.
Test needs to be done against the last version (github master) - 0.79 is old :-)
On Mon, Nov 28, 2016 at 2:54 PM, Alex Huang notifications@github.com wrote:
Error is still reproducable with outrageous acts of science as of V 0.79. Same pattern occurs.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-263421130, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2bABA1uF34mIvKk2IyDPvidPJdj0ks5rC1uagaJpZM4Dntec .
Tested again with .82. Still gives the same error with the same pattern.
Well, if they both are bad with this videos (VLC and CCExtractor), then for whatever reason, the subtitles are displayed on link too bad? It turns out that TiVo is good, and all the other bad?
GSOC qualification: This issue gives 3 points.
tried debug mode to resolve this issue but still generates same pattern too
At this point I strongly suspect GIGO.
Using valgrind
==4583==
==4583== HEAP SUMMARY:
==4583== in use at exit: 22 bytes in 1 blocks
==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated
==4583==
==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor)
==4583== by 0x51B31C0: (below main) (libc-start.c:308)
==4583==
==4583== LEAK SUMMARY:
==4583== definitely lost: 22 bytes in 1 blocks
==4583== indirectly lost: 0 bytes in 0 blocks
==4583== possibly lost: 0 bytes in 0 blocks
==4583== still reachable: 0 bytes in 0 blocks
==4583== suppressed: 0 bytes in 0 blocks
==4583==
==4583== For counts of detected and suppressed errors, rerun with: -v
==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Probably @arantius is right
You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.
On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:
Using valgrind
==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Probably @arantius https://github.com/arantius is right
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .
oops i forgot to mention that i also used debug command on those proces
On 17 Jan 2018 02:05, "Carlos Fernandez Sanz" notifications@github.com wrote:
You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.
On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:
Using valgrind
==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_ memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Probably @arantius https://github.com/arantius is right
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_ VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-358070047, or mute the thread https://github.com/notifications/unsubscribe-auth/Adw-wHdAZ_ETXfXYSTurTVeiMxR4_Fsyks5tLPLdgaJpZM4Dntec .
That's not the output of valgrind running on a version with debug symbols, really :-)
On Tue, Jan 16, 2018 at 3:56 PM, Theodore Fabian Rudy < notifications@github.com> wrote:
oops i forgot to mention that i also used debug command on those proces
On 17 Jan 2018 02:05, "Carlos Fernandez Sanz" notifications@github.com wrote:
You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.
On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:
Using valgrind
==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_ memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Probably @arantius https://github.com/arantius is right
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_ VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 358070047, or mute the thread https://github.com/notifications/unsubscribe-auth/Adw-wHdAZ_ ETXfXYSTurTVeiMxR4_Fsyks5tLPLdgaJpZM4Dntec .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-358147842, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2RDl4UHbEx_1B8T-mExkoxD2iRIXks5tLTcpgaJpZM4Dntec .
https://github.com/dscottbuch/cTiVo/issues/26
found this. Maybe if it helps
The command i used for the process above was valgrind --leak-check=full --show-leak-kinds=all ccextractor OutrageousScience.mpg -debug
I think the problem existed in general_loop.c (because the ps data grabber existed here). I also have already process the (I have sent the valgrind result with debug command in issue).
VLC also show the same subtitles as CCExtractor (I have already checked most of them).
The other solution(maybe) is decode strangeheader from // TiVo is also a PS if (ctx->startbytes[0]=='T' && ctx->startbytes[1]=='i' && ctx->startbytes[2]=='V' && ctx->startbytes[3]=='o') { // The TiVo header is longer, but the PS loop will find the beginning dbg_print(CCX_DMT_PARSE, "detect_stream_type: detected as Tivo PS\n"); ctx->startbytes_pos = 187; ctx->stream_mode = CCX_SM_PROGRAM; ctx->strangeheader = 1; // Avoid message about unrecognized header
( Can be found in stream_functions.c )
This has been open for a really long time. Closing. If someone posts fresh samples we'll revisit, but I guess there's no point otherwise. Don't know if people still use Tivo and if the problem still exists?
I would like to re-open this issue. Although I'm new to the ccextractor community, and may need a bit of tutelage in how to provide the most useful data, I'm rather obsessive about running bugs to ground.
I have encountered the problem where ccextractor drops pairs of characters from closed captions in files fetched from TiVo.
The insight about trying Transport vs. Program stream shows me pretty clearly that the two streams give radically different output.
I saw a discussion thread that said transport streams were unreliable for captions, and program streams were preferable. My experience is the opposite -- that the program stream was missing bazillions of instances of pairs of characters dropped, and whole captions missing compared to the transport stream.
The particular file I used in my testing is rather large. If someone will supply a clue on how to just provide a short excerpt (I.E. how to truncate the file without completely breaking it,) I'll supply an excerpt.
Here is the workflow I used:
Program Stream: cTiVo download in format "Decrypted TiVo Show"; With "Don't delete temporaries" I get the SRT file. Or run ccextractor of the delivered .mpg file
Transport Stream:
Running ccextractor on the program stream seemed quite happy while the run against the transport stream spit out a lot of complaints:
Run against program stream:
wdc-home-3:pending wdc$ ccextractor American\ Masters-Twyla\ Moves-1.mpg
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: American Masters-Twyla Moves-1.mpg
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: American Masters-Twyla Moves-1.mpg
File seems to be a program stream, enabling PS mode
Analyzing data in general mode
0% | 00:00
New video information found
[1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no]
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-PG (Parental Guidance Suggested)
XDS:
100% | 89:5822
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0
Total frames time: 01:29:57:859 (161774 frames at 29.97fps)
CC type 0: 35478 (NTSC line 21 field 1 closed captions)
CC type 1: 5478 (NTSC line 21 field 2 closed captions)
CC type 2: 0 (DTVCC Channel Packet Data)
CC type 3: 0 (DTVCC Channel Packet Start)
Min PTS: 00:00:01:000
Max PTS: 01:29:59:291
Length: 01:29:58:291
Initial GOP time: 12:15:26:367
Final GOP time: 13:45:23:567+19F
Diff. GOP length: 01:29:57:200+19F (01:29:57:833)
Number of key frames: 6034
Total user data fields: 308165
HDTV type user data fields: 146391
Done, processing time = 74 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
Partial output from run against transport stream:
wdc-home-3:pending wdc$ ccextractor AmericanMasters-TwylaMoves.m2ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: AmericanMasters-TwylaMoves.m2ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: AmericanMasters-TwylaMoves.m2ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
New video information found
[1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no]
0% | 00:00
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-PG (Parental Guidance Suggested)
XDS:
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
...
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
100% | 89:57
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0
Total frames time: 01:07:31:747 (121431 frames at 29.97fps)
CC type 0: 29633 (NTSC line 21 field 1 closed captions)
CC type 1: 4625 (NTSC line 21 field 2 closed captions)
CC type 2: 69552 (DTVCC Channel Packet Data)
CC type 3: 29610 (DTVCC Channel Packet Start)
Min PTS: 09:45:18:715
Max PTS: 11:15:16:706
Length: 01:29:57:991
Initial GOP time: 12:15:26:367
Final GOP time: 13:45:23:567+19F
Diff. GOP length: 01:29:57:200+19F (01:29:57:833)
Number of key frames: 4535
Total user data fields: 242862
HDTV type user data fields: 121431
Done, processing time = 92 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
Short excerpts of the resulting .srt files
Program Stream:
1
00:00:08,108 --> 00:00:17,516
♪♪
2
00:00:18,818 --> 00:00:21,954
<i> Major support r "American</i>
<i> sters" proded by...</i>
3
00:00:22,056 --> 00:00:24,590
-These days, we need eh other
more than ever.
4
00:00:24,692 --> 00:00:28,027
That's why AARP created
Community Connections,
5
00:00:31,799 --> 00:00:34,099
get help,
or help those in need.
Transport Stream:
1
00:00:07,807 --> 00:00:17,216
♪♪
2
00:00:18,518 --> 00:00:21,653
<i> Major support for "American</i>
<i> Masters" provided by...</i>
3
00:00:21,756 --> 00:00:24,289
-These days, we need each other
more than ever.
4
00:00:24,392 --> 00:00:27,726
That's why AARP created
Community Connections,
5
00:00:27,828 --> 00:00:33,799
an online tool to find or create
a mutual-aid group,
Diff output:
wdc-home-3:pending wdc$ head -n 24 American\ Masters-Twyla\ Moves-1.srt >t1
wdc-home-3:pending wdc$ head -n 24 AmericanMasters-TwylaMoves-ts.srt >t2
wdc-home-3:pending wdc$ diff t1 t2
1,2c1,2
< 1
< 00:00:08,108 --> 00:00:17,516
---
> 1
> 00:00:07,807 --> 00:00:17,216
6,8c6,8
< 00:00:18,818 --> 00:00:21,954
< <i> Major support r "American</i>
< <i> sters" proded by...</i>
---
> 00:00:18,518 --> 00:00:21,653
> <i> Major support for "American</i>
> <i> Masters" provided by...</i>
11,13c11,13
< 00:00:22,056 --> 00:00:24,590
< -These days, we need eh other
< more than ever.
---
> 00:00:21,756 --> 00:00:24,289
> -These days, we need each other
> more than ever.
16c16
< 00:00:24,692 --> 00:00:28,027
---
> 00:00:24,392 --> 00:00:27,726
21,23c21,23
< 00:00:31,799 --> 00:00:34,099
< get help,
< or help those in need.
---
> 00:00:27,828 --> 00:00:33,799
> an online tool to find or create
> a mutual-aid group,
Oh critical data I forgot to supply -- version information:
CCExtractor detailed version info Version: 0.94 Git commit: Unknown Compilation date: 2022-01-16 CEA-708 decoder: C File SHA256: Could not open file Libraries used by CCExtractor libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi
Output is garbled in some (but not all) recordings from Tivo.
3 samples can be checked out here:
https://drive.google.com/folderview?id=0B3bPKNXgZu0-fjAxWFN2YXJSSFdZSlpRYllPSDBxTk9xUlU4dDZiUllxRE5kZXp1cEpSX2c