Garbled output in some Tivo samples

cfsmp3 commented 9 years ago

Output is garbled in some (but not all) recordings from Tivo.

3 samples can be checked out here:

https://drive.google.com/folderview?id=0B3bPKNXgZu0-fjAxWFN2YXJSSFdZSlpRYllPSDBxTk9xUlU4dDZiUllxRE5kZXp1cEpSX2c

Akirato commented 9 years ago

Hey, the codebase is large so I am not yet able to pinpoint the error. But the error follows a pattern. In each output two characters move after four places. eg: In the Outrageous Acts of Science

THE SWINNG H[GI]AMMOCK -> THE SWIN[GI]{NG H}AMMOCK IS ACTUALLY VERYIMIL[ S]AR -> IS ACTUALLY VERY[ S]{IMIL}AR and so on..... Same could be observed in all the samples. Hope this helps.

anshul1912 commented 9 years ago

I did regression test, this error is from 0.71, from starting

uajain commented 9 years ago

@anshul1912 I ran the samples with v0.71 and v0.70. Both of them gives garbled output.

P.S. - (being a beginner) by regression test, you mean that this feature was fine before and broke in v0.71 ?

anshul1912 commented 9 years ago

@mailumangjain thats what I was trying to say, it is broken from start

uajain commented 9 years ago

So we have a new format which CCextractor does not support ? then what is the methodology to support this ?

anshul1912 commented 9 years ago

these files are supported, but there is some bug in code, which need to be taken care. That format is supported but these files have something different in it, which ends up things jumbled. If file is not supported then it would and should say its not supported

uajain commented 9 years ago

Any suggestions what might be? Because I've gone nuts tracking down this bug, I see the buffer, it contains garbled output, but from where it is creeping is not what i am getting.

Abhinav95 commented 9 years ago

I just tried running the sample with v0.69. The output is still garbled.

On Sun, Mar 8, 2015 at 2:45 PM, Umang Jain notifications@github.com wrote:

Any suggestions what might be? Because I've gone nuts tracking down this bug, I see the buffer, it contains garbled output, but from where it is creeping is not what i am getting.

— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-77740369 .

Akirato commented 9 years ago

Can there actually be a problem in the ccx_decoders? They are the one's filling the buffers right....

dwhe commented 9 years ago

I can confirm that I have witnessed this issue with CCextractor many times with that exact same pattern. I am just a user, however, so I cannot do much more other than share my user experience with this app and confirm that I am experiencing it too. Hopefully this is helpful in some small way.

On Mar 7, 2015, at 7:14 AM, Nurendra Choudhary notifications@github.com wrote:

Hey, the codebase is large so I am not yet able to pinpoint the error. But the error follows a pattern. In each output two characters move after four places. eg: In the Outrageous Acts of Science

THE SWINNG H[GI]AMMOCK -> THE SWIN[GI]{NG H}AMMOCK IS ACTUALLY VERYIMIL[ S]AR -> IS ACTUALLY VERY[ S]{IMIL}AR and so on..... Hope this helps.

— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-77693192.

canihavesomecoffee commented 9 years ago

@dwhe It would be even better if you could share some samples, or provide some more details (like input format, used parameters, etc.) :)

vivanishin commented 9 years ago

Hey! I don't get it: as far as I can see, the input is garbled (and sure enough the output is equally garbled). I delete .srt, open a video file in vlc, select subtitle track = closed captions 1 and see the exact same mutilated captions. Is there something I don't understand everyone else here does? Or I just shouldn't trust vlc either because it has same bugs?)

canihavesomecoffee commented 9 years ago

VLC is not always correct either ;)

A sample can be indeed corrupt sometimes, but it can also happen that both VLC (for example) and CCExtractor have a bug in the processing of the code.

To determine if the caption data (input) is good, it might be interesting to analyze it deep-level, and compare it to what the specs are definining as correct order/behaviour.

vivanishin commented 9 years ago

OK, thank you. Are there any useful tools? For example are there available software CC encoders? Or just the ccextractor's decoders themselves and the specs?)

dwhe commented 9 years ago

I can describe the input - I download the video from my Tivo units (I have a Series3 and a Premiere units), and convert them either using KMTTG or or cTivo, both of which processes involve the use of ccextractor. The garbling described earlier occurs with both Tivo units and both KMTTG and cTivo.

However, until this thread was created, I always thought that the garbling was the fault of the transfer process that occurred between the Tivo units to my mac - either caused by the Tivo OS or by the two software apps (KMTTH & cTivo) that downloaded the videos from the Tivo units. I thought I would pipe in because the garbling description is exactly what I have been experiencing.

It is very possible that the garbling is not caused by ccextractor but I found it interesting that my experience is identical to what was described. It is also possible that the identical garbling just happens to be identical and not necessarily related to OP’s issue

Do let me know if you’d like any additional details from me.

On Mar 20, 2015, at 6:45 PM, Willem notifications@github.com wrote:

@dwhe https://github.com/dwhe It would be even better if you could share some samples, or provide with details (like input format, used parameters, etc.) :)

— Reply to this email directly or view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-84217286.

arantius commented 9 years ago

@dwhe is correct. The TiVo is the problem.

I've done more tests. The TiVo can trasfer either PS or TS. One is faster, one is "more reliable". I've always done PS because it worked more consistently for me. Today I transferred a file as PS like I normally do, and got garbled captions. Transfer the same file as TS and the captions are OK. (Sadly, I have many dozens of archived files already transferred as PS. Ah well.)

If whatever garbling has happened to the input can still be handled correctly, I'd be a very happy user! But right now this looks like a GIGO issue.

cfsmp3 commented 9 years ago

If Tivo is able to play the PS and display captions correctly then it follows that it's possible to extract correct CC from a Tivo PS.

The TS and PS code in CCExtractor is different, so correct TS and broken PS might be an issue in CCExtractor.

It would be helpful to get a PS and a TS for the same recording - that should help us figure out what's different.

arantius commented 9 years ago

I've added four short clips to the original URL linked above, both in PS and TS format, with those indicators in the names. The extracted captions are all OK for the TS format files, and all exhibit the described arbitrarily missing/transposed two character issues. (Actually, this time it looks like they're most/all missing, not really ever transposed.)

rkuchumov commented 8 years ago

I've checked all PS files. I was searching for 0xB24741393403 bytes which means the beginning of caption blocks (picture user data with CC according to A/53 Part 4, chapter 6.2.2). There were no other blocks with CC.

In files Deadliest.. , Dirty.., NASA.., Redrum CC blocks with text were missing in the places where they must be. So, the problem is likely to be with files generated by Tivo. By the way, when I convert TS files to PS using VLC captions are fine.

In files Family.., How.., Outran.. a lot of CC are in the wrong order, and only a few CC blocks with text are missing. So, maybe control blocks are missing or something or the problem is with CCExtractor. I need to do more investigations.

test.mpg works fine with "-2" parameter.

rkuchumov commented 8 years ago

Files Family.., How.., Outran.. are quite odd. For some reason there is a wrong order of pictures at the end of picture sequence.

For example, we have (the first column is the file offset):

91A579  SEQ
91A58F  GOP  drop:0  time:0:0:0:0  closed:1  broken:0
91A597  PIC  tref:0 type:I
91A5A8  CC   v:1 t:0 [11;52]
957DF2  PIC  tref:3 type:P
957E8A  CC   v:1 t:0 [20;41] < ;A>
962AEB  PIC  tref:1 type:B
962B83  CC   v:1 t:0 [11;52]
968E40  PIC  tref:2 type:B
968ED8  CC   v:1 t:0 [49;53] <I;S>
96F779  PIC  tref:6 type:P
96F811  CC   v:1 t:0 [4C;4C] <L;L>
97830A  PIC  tref:4 type:B
97831C  CC   v:1 t:0 [43;54] <C;T>
97F997  PIC  tref:5 type:B
97F9A9  CC   v:1 t:0 [55;41] <U;A>
987930  PIC  tref:9 type:P
9879C8  CC   v:1 t:0 [52;59] <R;Y>
990909  PIC  tref:7 type:B
9909A1  CC   v:1 t:0 [59;20] <Y; >
9973DE  PIC  tref:8 type:B
9973F0  CC   v:1 t:0 [56;45] <V;E>
99DC67  PIC  tref:12 type:P
99DCFF  CC   v:1 t:0 [20;53] < ;S>
9A7F30  PIC  tref:10 type:B
9A7F42  CC   v:1 t:0 [49;4D] <I;M>
9B0C2D  PIC  tref:11 type:B
9B0C3F  CC   v:1 t:0 [49;4C] <I;L>

After sorting by temporal reference it yields: "IS ACTUALLY VERYIMIL S". which is exactly the output of CCExtractor. But, if we shift the last picture by 2 positions, i.e. ..., 8, 9, 12, 10, 11 it'll yield the correct output (IS ACTUALLY VERY SIMIL).

The same happens throughout all 3 files. I didn't find any description for that in specs (maybe I've missed something). Any ideas?

cfsmp3 commented 8 years ago

My only suggestion -for now- is to get more samples and see if it happens with all or just some of them .

Maybe the TiVo owners in this thread can supply us with more? Also both the PS and TS versions so we can reach some conclusion by comparing.

And -since I'm asking for stuff- URLs to download the program that does that conversion :-)

arantius commented 8 years ago

I grabbed five more arbitrary recordings, each right about one minute a piece, each in both TS and PS format. This was done by navigating to the TiVo's web interface and following the download links.

For each PS I ran "tivodecode" ( https://github.com/arantius/tivodecode ) to produce the decrypted mpeg. For each TS I passed it through "DirectShow Dump" (paired with TiVo Desktop; both included in the folder linked below). This is because tivodecode is much easier to use, but unable to process TS files. Unfortunately I can't share the decryption key so that's not of much use to you.

Here's the resulting files: https://drive.google.com/open?id=0B3bPKNXgZu0-Ri1MZ0lyMllzSzg

In these cases the PS files have obvious issues; the TS files seem good as well ... Until you get to sample E. In this case the TS is completely empty, and the PS still contains obvious issues ("TO PREVENT THE INGDIENRETS", "AFTER EA ICHNGREDIENT", "THE TANK'S CONNTTES", "QUALITY-CONTROL STINTEG.", "THE THICKNESS OF THEAINT P." In this case VLC is able to play e.ts and display all the subtitles correctly without error (though it does have issues with e.ps, worse than ccextractor does).

I hope this helps. Also: the files I actually care about all happen to come from the channel that sample E came from.

rkuchumov commented 8 years ago

Thanks for the samples.

I've done the same. There are caption blocks missing in all PS files. But in B and D, as I described earlier, after shifting the last picture header and corresponding CC block by 2 positions the output is fine. In file E, sometimes the offset should be 1 position.

Also I found that in files a.ps.tivo, b.ps, and especially in e.ps.tivo captions blocks are present and in the correct order, but they are not displayed by CCExtractor. These blocks are also at the end of GOP. Maybe it's somehow related to the previous problem or maybe CCExtractor doesn't flush caption buffer.

Files .ts.Tivo doesn't work at all. I can't play them in VLC either. But they have caption blocks :)

rkuchumov commented 8 years ago

Also I found that in files a.ps.tivo, b.ps, and especially in e.ps.tivo captions blocks are present and in the correct order, but they are not displayed by CCExtractor.

Nope, in these cases the last CC block contains PAC which sets cursor position. As it's misplaced, new captions overwrite the previous ones, so they are not displayed.

So, these errors follows the same pattern. Either it's defined in specs or the bug is somewhere outside of CCExtractor. @arantius Did you write tivodecode?

arantius commented 8 years ago

No, I did not write tivodecode.

canihavesomecoffee commented 8 years ago

@rkuchumov Original project is here: https://sourceforge.net/projects/tivodecode/

Seems to be abandoned.

ghost commented 7 years ago

Error is still reproducable with outrageous acts of science as of V 0.82. Same pattern occurs.

mackworth commented 7 years ago

There is a TS compatible next-generation version of it here: https://github.com/wmcbrine/tivodecode-ng Despite the warning comment; it works great.

cfsmp3 commented 7 years ago

Test needs to be done against the last version (github master) - 0.79 is old :-)

On Mon, Nov 28, 2016 at 2:54 PM, Alex Huang notifications@github.com wrote:

Error is still reproducable with outrageous acts of science as of V 0.79. Same pattern occurs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-263421130, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2bABA1uF34mIvKk2IyDPvidPJdj0ks5rC1uagaJpZM4Dntec .

ghost commented 7 years ago

Tested again with .82. Still gives the same error with the same pattern.

Izaron commented 7 years ago

Well, if they both are bad with this videos (VLC and CCExtractor), then for whatever reason, the subtitles are displayed on link too bad? It turns out that TiVo is good, and all the other bad?

cfsmp3 commented 7 years ago

GSOC qualification: This issue gives 3 points.

FlyingTwigs commented 6 years ago

tried debug mode to resolve this issue but still generates same pattern too

arantius commented 6 years ago

At this point I strongly suspect GIGO.

FlyingTwigs commented 6 years ago

Using valgrind

==4583== 
==4583== HEAP SUMMARY:
==4583==     in use at exit: 22 bytes in 1 blocks
==4583==   total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated
==4583== 
==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4583==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4583==    by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x1DE48B: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x1DFA38: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x1F3B69: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x1F1C52: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x118114: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x117D10: ??? (in /usr/local/bin/ccextractor)
==4583==    by 0x51B31C0: (below main) (libc-start.c:308)
==4583== 
==4583== LEAK SUMMARY:
==4583==    definitely lost: 22 bytes in 1 blocks
==4583==    indirectly lost: 0 bytes in 0 blocks
==4583==      possibly lost: 0 bytes in 0 blocks
==4583==    still reachable: 0 bytes in 0 blocks
==4583==         suppressed: 0 bytes in 0 blocks
==4583== 
==4583== For counts of detected and suppressed errors, rerun with: -v
==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Probably @arantius is right

cfsmp3 commented 6 years ago

You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.

On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:

Using valgrind

==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Probably @arantius https://github.com/arantius is right

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .

FlyingTwigs commented 6 years ago

oops i forgot to mention that i also used debug command on those proces

On 17 Jan 2018 02:05, "Carlos Fernandez Sanz" notifications@github.com wrote:

You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.

On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:

Using valgrind

==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_ memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Probably @arantius https://github.com/arantius is right

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_ VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-358070047, or mute the thread https://github.com/notifications/unsubscribe-auth/Adw-wHdAZ_ETXfXYSTurTVeiMxR4_Fsyks5tLPLdgaJpZM4Dntec .

cfsmp3 commented 6 years ago

That's not the output of valgrind running on a version with debug symbols, really :-)

On Tue, Jan 16, 2018 at 3:56 PM, Theodore Fabian Rudy < notifications@github.com> wrote:

oops i forgot to mention that i also used debug command on those proces

On 17 Jan 2018 02:05, "Carlos Fernandez Sanz" notifications@github.com wrote:

You should always use the debug version on valgrid, other we're missing useful information such as line numbers. Anyway, one 22 bytes block is probably not a memory leak. We're just not cleaning everything up before terminating.

On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy < notifications@github.com> wrote:

Using valgrind

==4583== ==4583== HEAP SUMMARY: ==4583== in use at exit: 22 bytes in 1 blocks ==4583== total heap usage: 252 allocs, 251 frees, 97,789,704 bytes allocated ==4583== ==4583== 22 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==4583== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_ memcheck-amd64-linux.so) ==4583== by 0x1CAAF1: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DE48B: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1DFA38: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F3B69: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x1F1C52: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x118114: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x117D10: ??? (in /usr/local/bin/ccextractor) ==4583== by 0x51B31C0: (below main) (libc-start.c:308) ==4583== ==4583== LEAK SUMMARY: ==4583== definitely lost: 22 bytes in 1 blocks ==4583== indirectly lost: 0 bytes in 0 blocks ==4583== possibly lost: 0 bytes in 0 blocks ==4583== still reachable: 0 bytes in 0 blocks ==4583== suppressed: 0 bytes in 0 blocks ==4583== ==4583== For counts of detected and suppressed errors, rerun with: -v ==4583== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Probably @arantius https://github.com/arantius is right

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 357989479, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2V4_ VkWsYHEI0lI0zJ78CmuupFy3ks5tLLt4gaJpZM4Dntec .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment- 358070047, or mute the thread https://github.com/notifications/unsubscribe-auth/Adw-wHdAZ_ ETXfXYSTurTVeiMxR4_Fsyks5tLPLdgaJpZM4Dntec .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/136#issuecomment-358147842, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2RDl4UHbEx_1B8T-mExkoxD2iRIXks5tLTcpgaJpZM4Dntec .

FlyingTwigs commented 6 years ago

https://github.com/dscottbuch/cTiVo/issues/26

found this. Maybe if it helps

FlyingTwigs commented 6 years ago

The command i used for the process above was valgrind --leak-check=full --show-leak-kinds=all ccextractor OutrageousScience.mpg -debug

FlyingTwigs commented 6 years ago

I think the problem existed in general_loop.c (because the ps data grabber existed here). I also have already process the (I have sent the valgrind result with debug command in issue).

VLC also show the same subtitles as CCExtractor (I have already checked most of them).

The other solution(maybe) is decode strangeheader from // TiVo is also a PS if (ctx->startbytes[0]=='T' && ctx->startbytes[1]=='i' && ctx->startbytes[2]=='V' && ctx->startbytes[3]=='o') { // The TiVo header is longer, but the PS loop will find the beginning dbg_print(CCX_DMT_PARSE, "detect_stream_type: detected as Tivo PS\n"); ctx->startbytes_pos = 187; ctx->stream_mode = CCX_SM_PROGRAM; ctx->strangeheader = 1; // Avoid message about unrecognized header ( Can be found in stream_functions.c )

cfsmp3 commented 5 years ago

This has been open for a really long time. Closing. If someone posts fresh samples we'll revisit, but I guess there's no point otherwise. Don't know if people still use Tivo and if the problem still exists?

poetnerd commented 2 years ago

I would like to re-open this issue. Although I'm new to the ccextractor community, and may need a bit of tutelage in how to provide the most useful data, I'm rather obsessive about running bugs to ground.

I have encountered the problem where ccextractor drops pairs of characters from closed captions in files fetched from TiVo.

The insight about trying Transport vs. Program stream shows me pretty clearly that the two streams give radically different output.

I saw a discussion thread that said transport streams were unreliable for captions, and program streams were preferable. My experience is the opposite -- that the program stream was missing bazillions of instances of pairs of characters dropped, and whole captions missing compared to the transport stream.

The particular file I used in my testing is rather large. If someone will supply a clue on how to just provide a short excerpt (I.E. how to truncate the file without completely breaking it,) I'll supply an excerpt.

Here is the workflow I used:

Program Stream: cTiVo download in format "Decrypted TiVo Show"; With "Don't delete temporaries" I get the SRT file. Or run ccextractor of the delivered .mpg file

Transport Stream:

Extract the URL from the log file from that download
Add a suffix to the URL: "Format=video/x-tivo-mpeg-ts"
Use curl to fetch the transport stream from the TiVo.
Use tivodecode to decrypt
ccextractor of the decrypted file

Running ccextractor on the program stream seemed quite happy while the run against the transport stream spit out a lot of complaints:

Run against program stream:

wdc-home-3:pending wdc$ ccextractor American\ Masters-Twyla\ Moves-1.mpg
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: American Masters-Twyla Moves-1.mpg
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: American Masters-Twyla Moves-1.mpg
File seems to be a program stream, enabling PS mode
Analyzing data in general mode
  0%  |  00:00

New video information found
[1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no]

XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-PG (Parental Guidance Suggested)
XDS: 
100%  |  89:5822
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Total frames time:    01:29:57:859  (161774 frames at 29.97fps)
CC type 0: 35478 (NTSC line 21 field 1 closed captions)
CC type 1: 5478 (NTSC line 21 field 2 closed captions)
CC type 2: 0 (DTVCC Channel Packet Data)
CC type 3: 0 (DTVCC Channel Packet Start)

Min PTS:                00:00:01:000
Max PTS:                01:29:59:291
Length:              01:29:58:291

Initial GOP time:      12:15:26:367
Final GOP time:      13:45:23:567+19F
Diff. GOP length:      01:29:57:200+19F (01:29:57:833)

Number of key frames: 6034
Total user data fields: 308165
HDTV type user data fields: 146391
Done, processing time = 74 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Partial output from run against transport stream:

wdc-home-3:pending wdc$ ccextractor AmericanMasters-TwylaMoves.m2ts 
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: AmericanMasters-TwylaMoves.m2ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: AmericanMasters-TwylaMoves.m2ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode

New video information found
[1920 * 1080] [AR: 03 - 16:9] [FR: 04 - 29.97] [progressive: no]

  0%  |  00:00
Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-PG (Parental Guidance Suggested)
XDS: 
  Notice: Missing PES header

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header

Skip forward to the next Sequence or GOP start.

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header
Notice: Missing PES header

...

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
Notice: Missing PES header

Skip forward to the next Sequence or GOP start.
Notice: Missing PES header
100%  |  89:57
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Total frames time:    01:07:31:747  (121431 frames at 29.97fps)
CC type 0: 29633 (NTSC line 21 field 1 closed captions)
CC type 1: 4625 (NTSC line 21 field 2 closed captions)
CC type 2: 69552 (DTVCC Channel Packet Data)
CC type 3: 29610 (DTVCC Channel Packet Start)

Min PTS:                09:45:18:715
Max PTS:                11:15:16:706
Length:              01:29:57:991

Initial GOP time:      12:15:26:367
Final GOP time:      13:45:23:567+19F
Diff. GOP length:      01:29:57:200+19F (01:29:57:833)

Number of key frames: 4535
Total user data fields: 242862
HDTV type user data fields: 121431
Done, processing time = 92 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Short excerpts of the resulting .srt files

Program Stream:

1
00:00:08,108 --> 00:00:17,516
               ♪♪               

2
00:00:18,818 --> 00:00:21,954
  <i> Major support r "American</i>    
    <i> sters" proded by...</i>        

3
00:00:22,056 --> 00:00:24,590
 -These days, we need eh other  
        more than ever.         

4
00:00:24,692 --> 00:00:28,027
     That's why AARP created    
     Community Connections,     

5
00:00:31,799 --> 00:00:34,099
            get help,           
     or help those in need.

Transport Stream:

1
00:00:07,807 --> 00:00:17,216
               ♪♪               

2
00:00:18,518 --> 00:00:21,653
  <i> Major support for "American</i>  
    <i> Masters" provided by...</i>    

3
00:00:21,756 --> 00:00:24,289
 -These days, we need each other
         more than ever.        

4
00:00:24,392 --> 00:00:27,726
     That's why AARP created    
     Community Connections,     

5
00:00:27,828 --> 00:00:33,799
an online tool to find or create
       a mutual-aid group,

Diff output:

wdc-home-3:pending wdc$ head -n 24 American\ Masters-Twyla\ Moves-1.srt >t1
wdc-home-3:pending wdc$ head -n 24 AmericanMasters-TwylaMoves-ts.srt >t2
wdc-home-3:pending wdc$ diff t1 t2
1,2c1,2
< 1
< 00:00:08,108 --> 00:00:17,516
---
> 1
> 00:00:07,807 --> 00:00:17,216
6,8c6,8
< 00:00:18,818 --> 00:00:21,954
<   <i> Major support r "American</i>    
<     <i> sters" proded by...</i>        
---
> 00:00:18,518 --> 00:00:21,653
>   <i> Major support for "American</i>  
>     <i> Masters" provided by...</i>    
11,13c11,13
< 00:00:22,056 --> 00:00:24,590
<  -These days, we need eh other  
<         more than ever.         
---
> 00:00:21,756 --> 00:00:24,289
>  -These days, we need each other
>          more than ever.        
16c16
< 00:00:24,692 --> 00:00:28,027
---
> 00:00:24,392 --> 00:00:27,726
21,23c21,23
< 00:00:31,799 --> 00:00:34,099
<             get help,           
<      or help those in need.     
---
> 00:00:27,828 --> 00:00:33,799
> an online tool to find or create
>        a mutual-aid group,

poetnerd commented 2 years ago

Oh critical data I forgot to supply -- version information:

wdc-home-3:pending wdc$ ccextractor --version CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc

CCExtractor detailed version info Version: 0.94 Git commit: Unknown Compilation date: 2022-01-16 CEA-708 decoder: C File SHA256: Could not open file Libraries used by CCExtractor libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi

CCExtractor / ccextractor

Garbled output in some Tivo samples #136

wdc-home-3:pending wdc$ ccextractor --version CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc