Closed frasderp closed 1 year ago
I'll take a look at these. They seem to be a bit more aggressive than I wish. It would be great if you could send me the subtitles.
Hey @KBlixt happy to. Where can I send the files?
Just put them in here. 👍
But you don't need to, I know why they are deleted.
I'll see what I can do about these, but they seem to be just unfortunate subtitles that gets cought. Some false positives are inevitable, I'm working on a easy to use review process to deal with these potential edge-cases that will restore/delete false positives/negatives.
@KBlixt ok thanks. Would you mind sharing which part of the regex is catching them? I'd like to review the conf also!
If you use the --explain option you'll get a list of reasons why a block got deteleted. But in these cases I belive they are:
First subtitle: Both of them get 2 warnings from regex: global warning 2 and 3 specifically this part "\b\d+\Wx\W\d+\b " for the 3x21 at the start.
Then since they are both close to a block with two warnings they both get an additional warning.
Second subtitle: They both get one for having identical content as another block, and they both get one warning for having "fixing" in them, and then a final warning for being close to another block with 2 warnings (within 15 blocks so barely in range)
A few from my use:
| [---------Warning Blocks----------]
| 90
| 00:05:14,633 --> 00:05:16,347
| <i>Created
| by doing some tillage</i>
| reasons: (en_warn1, en_warn2)
| [---------------------------------]
| [---------Warning Blocks----------]
| 124
| 00:08:04,613 --> 00:08:06,824
| were virtually created by the ABA.
| reasons: (en_warn1, en_warn2)
| [---------------------------------]
[---------Warning Blocks----------]
| 453
| 00:39:02,177 --> 00:39:04,470
| - Let's get this fixed right now.
| - It's fixed.
| reasons: (en_warn1, en_warn1)
| [---------------------------------]
[---------Warning Blocks----------]
| 342
| 00:24:08,282 --> 00:24:10,708
| Had a case before
| O'Dwyer... uh, copyright.
| reasons: (en_warn7, global_warn4)
| [---------------------------------]
| [---------Warning Blocks----------]
| 575
| 00:39:53,374 --> 00:39:54,801
| That's copyright infringement.
| reasons: (en_warn7, global_warn4)
|
| 596
| 00:41:11,850 --> 00:41:14,295
| Meanwhile, we have to counter
| the copyright injunction.
| reasons: (en_warn7, global_warn4)
|
| 603
| 00:41:34,825 --> 00:41:37,453
| Okay, now, with the
| copyright infringement, I think we...
| reasons: (en_warn7, global_warn4)
| [---------------------------------]
Any way to remedy this, as I'd like to use this tool but removing things like this wouldn't make it feasible for me. Copyright mentions are due to it being a "law show"
@JackBailey
Im sorry but from what I understand these subtitle blocks aren't removed they are simply warnings meaning that they will not be removed
Warnings is a way to bring attention to blocks that just barely wasn't removed in order to make it easier to see stuff that is close to being removed.
Or am I missunderstanding something here?
I have however removed the copyright regex in the English profile so that the word copyright is allowed to be present twice in a single block. That was left behind in the English profile by mistake.
But none of the blocks you provided would have been removed perviously and will not appear even as warnings going forward.
Oh right okay thank you, it was my misunderstanding
No prob 🙂 And thanks for helping me find the duplicate regex 👍
Just wanted to share a few false positives that have come up, and how they might be captured in the regex (just using global and english regex configs).
This came from Nightcrawler, I believe they are radio callsigns
and this from Wreck-It Ralph