Closed bbgdzxng1 closed 3 years ago
The problem is a null pointer de-reference. timing
is a null pointer so trying to get sync_pts2fts_set
crashes.
If that timing
is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html
If that
timing
is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.html
Here, by "values," do you mean the variables context->timing->sync_pts2fts_pts
, h1
, m1
, s1
, ms1
(in the code below) should be set to 0
? This makes sense according to the recommended settings.
If you don't mind, can I try to look into whether context->timing
is supposed to be null
or not?
If that
timing
is meant to be null and it isn't a problem with the parser, should we just set the values to 0? That seems to be what is recommended according to https://sdks.support.brightcove.com/features/synchronizing-webvtt-captions.htmlHere, by "values," do you mean the variables
context->timing->sync_pts2fts_pts
,h1
,m1
,s1
,ms1
(in the code below) should be set to0
? This makes sense according to the recommended settings.
No, I meant we should write the header with as values 0 (instead of what would have been in sync_pts2fts_fts
).
No, I meant we should write the header with as values 0 (instead of what would have been in
sync_pts2fts_fts
).
I wanted to say the same. I should not have used said "set" variables. Shouldn't we also need to write 0's (in the header) instead of h1
, m1
, s1
, ms1
? Since they are dependent on sync_pts2fts_fts
(due to the function below)
https://github.com/CCExtractor/ccextractor/blob/cf84757e02889ffa0231624fdf941eafded59906/src/lib_ccx/ccx_encoders_webvtt.c#L215
I compared the webvtt
encoding for cc_sample_dos2unix.bin
(given in this thread) and UK_ITVBE.ts (provided on ccextractor's tv samples page) and found something really weird.
https://github.com/CCExtractor/ccextractor/blob/cf84757e02889ffa0231624fdf941eafded59906/src/lib_ccx/ccx_encoders_common.c#L935-L939
The problem occurs in the initialization of the encoder context. The malloc
function acts differently for UK_ITVBE.ts
(here, ctx->timing
is not null) and cc_sample_dos2unix.bin
(here, ctx->timing
is set to null
)
I added a fix by allocating memory for ctx->timing
but would love to discuss more why this is happening :)
ctx->timing
is not null because timing information is included in the file. I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.
The
malloc
function acts differently forUK_ITVBE.ts
(here,ctx->timing
is not null) andcc_sample_dos2unix.bin
(here,ctx->timing
is set tonull
)
By this, I meant that the value of ctx->timing
just after the initialization of ctx
(using malloc
) was different in both cases. Shouldn't the ctx->timing
value be the same in both cases since this is a simple initialization? (later on, it may change depending on the input)
I might have made a mistake somewhere (while debugging in vs code). I will check this again.
I assume this is not included in McPoodle raw (which just contains closed captions), or not in this specific mcpoodle raw file or ccextractor is not able to parse it correctly.
Will look into this.
I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!
I really appreciate you folks taking the time to look at this. I still plan on using McPoodle > ccextractor as a means of converting SCC > McP Raw > webVTT and I am very grateful that you have even looked, whatever the outcome. Thanks!
Happy to help. Hope to fix this soon!
Many thanks for trying to fix. I see that PR https://github.com/CCExtractor/ccextractor/pull/1298 has been closed. Just wanted to say thanks for your efforts so far and although the PR itself was not suitable for prod, I do still hope to use ccextractor to convert SCC > McP Raw > WebVTT. I'm currently going via SCC > McP Raw > SRT > WebVTT, but a conversion to WebVTT is hopefully a legitimate use case (for all the benefits of preserving markup, data integrity, character sets etc).
Are there any roadmap plans to support SCC as an input format? I know that ccextractor is a caption extractor rather than a caption convertor, but given the similarities between 608 and SCC, maybe that is a better way to solve for this use case, rather than receiving tickets about a proprietary and legacy McPoodle input format, which I appreciate is an edge case. If this is indeed a possibility, let me know and I would happily raise a feature request for SCC input, rather than a bug report for McPoodle.
There are very few good SCC convertors out there, and ccextractor is the best I have found for 608.
Is this issue still open? And is someone working on it?
Yes, still open, and no one's working on it AFAIK.
So I had a look through the file and I think the source of the problem is that the scc file doesn't have the information required to build the X-TIMESTAMP-MAP=MPEGTS
header?. Thus adding a check for context->timing != NULL
before accessing context->timing->sync_pts2fts
in (lib_ccx/ccx_encoders_webvtt.c; line 210) can avoid the segmentation fault in such cases. As far as I know its not a problem for a .vtt file to not contain the synchronisation offset in the header. The default is assumed to be 0 if none is present.
I tried running the provided files and am able to get the .vtt file without problem
I have attached a screenshot of the same
I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo
I will admit I'm baffled how my PR could break so many tests. How adding 1 line in webvtt encoder file breaks everything else as well, is confusing me and after going through the tests and downloading the files locally and running them they seem to be working. Am I missing something here? Note: This is my first contribution to this repo
The sample platform isn't bug-free either ;)
I've looked at the actual results page, and there there's fewer tests broken :)
Since it's a Windows report, the failing test for the dictionary (using a unix path) is expected. The failing test in Broken category can be ignored. The 3 failing ones in DVB worry me a bit more, but might be present in the main repo already, and the same goes for the Teletext one.
That's about it... So I think it's safe to say you can reopen that PR.
I see! Thanks. makes more sense. I have reopened the PR as suggested. Ill look at the DVB tests in a bit more detail just to be on the safer side.
Confirming that this is resolved in 0.89. Thank-you...
$ curl --silent --location --request GET "https://archive.org/download/cc_sample/cc_sample.scc" -o cc_sample.scc
$ dos2unix --newfile cc_sample.scc cc_sample_dos2unix.scc
$ perl /opt/mcpoodle/scc2raw.pl cc_sample_dos2unix.scc cc_sample_dos2unix.bin
$ ccextractor -debug -in=raw -out=webvtt-full -utf8 cc_sample_dos2unix.bin -o cc_sample_dos2unix.vtt
$ head -n10 "cc_sample_dos2unix.vtt"
WEBVTT
STYLE
/* default values */
::cue {
line-height: 5.33vh;
font-size: 4.1vh;
font-family: monospace;
$ tail -n10 "cc_sample_dos2unix.vtt"
00:01:03.531 --> 00:01:04.430 line:84.66%
Yes
00:01:04.465 --> 00:01:06.298 line:79.33%
I pray I never
00:01:04.465 --> 00:01:06.298 line:84.66%
have to deliver it
Thank-you to all who contributed to the investigation, development, testing and release.
ccextractor can now successfully convert scc-to-srt
, scc-to-vtt
(and probably other outputs) with the interim step of going via McPoodle raw format.
CCExtractor version: 0.88
In raising this issue, I confirm the following:
Necessary information
$ brew install ccextractor
Download cc sample from https://archive.org/download/cc_sample
Convert scc file to McPoodle's raw format using McPoodle's SCCTOOLS' scc2raw.pl, available from http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/scc2raw.pl. Carlos uses McPoodle's SCCTOOLS as part of this project.
At this point, we have a McPoodle raw format file, which has been generated by McPoodle's own tool.
Now attempting to convert McPoodle's raw format to VTT, using ccextractor 0.88 (see Additional Information section for error message)
Additional information
Converting from McPoodle Raw format to SRT is successful. Converting from McPoodle Raw format to VTT fails with Segmentation Fault.
Check input using report
Converting to SRT is successful
Converting to webVTT causes Segmentation Fault 11
Full commands for replication and sample file are included in this ticket.