CCExtractor / ccextractor

CCExtractor - Official version maintained by the core team
https://www.ccextractor.org
GNU General Public License v2.0
705 stars 422 forks source link

Brazilian ISDB ignores levdistmincnt, levdistmaxpct, and unixts #739

Closed Liontooth closed 6 years ago

Liontooth commented 7 years ago

Please prefix your issue with one of the following: [BUG]

CCExtractor version (using the --version parameter preferably) : 0.85 (latest zip file from github)

In raising this issue, I confirm the following (please check boxes, eg [X]):

See related issues:

It's possible that the -pn and -unixts errors I'm seeing are regressions. It's also possible that the files are subtly different. The earlier test files were recorded with a PixelView Play TV USB SBTVD Full Seg stick. The attached file was recorded with the new Brazilian HDHomeRun device.

My familiarity with the project is as follows (check one, eg [X]):

Necessary information

Video links http://vrnewsscape.ucla.edu/dropbox/2017-05-27_0045_BR_Record_Jornal_da_Record.mpg

Please make the affected input file available for us (no screenshots, those don't help!). Public links to Dropbox, Google Drive, etc, are all fine. If it is not possible to make it available publicly, send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository.

Do not upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc. If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it.

Additional information

{issue content here, replace this line with your issue content}

CCExtractor 0.85 does a great job with Brazilian ISDB captions, but there are four problems with switches:

  1. If the argument "-pn $PN" is used, ccextractor does not recognize the file to be ISDB. Not a serious bug, since everything works fine if the argument is removed, but not expected behavior.

  2. The -unixts argument is ignored. For instance, using -unixts 1495845901 with -datets and -UCLA produces this output:

19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR 19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI,

The unix epoch offset is not working. This is the most serious bug.

  1. The argument "-levdistmincnt" is ignored. For instance, using -levdistmincnt 2 produces this output:

19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR 19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI, 19700101000105.273|19700101000107.002|ISDB|>> POR FAVOR, POR AQUI, 19700101000105.273|19700101000107.002|ISDB|RAINHA! 19700101000107.002|19700101000108.480|ISDB|RAINHA! 19700101000107.002|19700101000108.480|ISDB|>> QUE LUGAR HORRÍVEL, TEM 19700101000108.480|19700101000109.758|ISDB|>> QUE LUGAR HORRÍVEL, TEM 19700101000108.480|19700101000109.758|ISDB|CERTEZA QUE É AQUI? 19700101000109.758|19700101000111.336|ISDB|CERTEZA QUE É AQUI?

Most lines are duplicated; deduplication is not happening.

  1. The argument -levdistmaxpct to help deduplicate also appears to be ignored.

Cheers, Dave

Abhinav95 commented 7 years ago

I will take a look at this around the end of this week. I presume it should not be a difficult fix but let us see.

Abhinav95 commented 7 years ago

For future reference:-

Abhinav95 commented 7 years ago

@Liontooth Can this issue be closed now? levdist is not needed in ISDB since other deduplication measures are in place. The required timestamps can be obtained by the workaround I mentioned.

Liontooth commented 6 years ago

The issues appear to be resolved, so I'm closing. Thank you!