DrKain / subclean

A cross-platform CLI tool and node module to remove advertising from subtitles. Supports Bazarr and bulk cleaning!
MIT License
55 stars 5 forks source link

[Bug] Can't get custom language filter to work #36

Closed Znuff closed 5 months ago

Znuff commented 6 months ago

Describe the bug I'm trying to add Romanian to the supported languages, but I can't seem to be able to get it working.

To Reproduce Steps to reproduce the behavior:

  1. Created file ro-main.json in ~/subclean/filters/ro-main.json
  2. I've added some basic rules to start with:
    [
    "regie-live.ro",
    "regielive.ro",
    "www.subtitrari-noi.ro"
    ]
  3. Verified that my .ro.srt has the strings:
    # egrep 'subtitrari-noi|Regie' Stargate\ SG-1\ -\ S10E20\ -\ Unending\ Bluray-1080p.ro.srt
    <font color="#8000ff">Subtitrări-Noi Team - www.subtitrari-noi.ro</font>
    <font color="#8000ff">Subtitrări-Noi Team - www.subtitrari-noi.ro</font>
    www.RegieLive.ro
  4. Run subclean with --debug:
    
    # subclean -w -i Stargate\ SG-1\ -\ S10E20\ -\ Unending\ Bluray-1080p.ro.srt --lang=ro --debug
    {
    _: [],
    w: true,
    i: 'Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt',
    lang: 'ro',
    debug: true
    }
    [debug] readFile: [utf-8] /root/subclean/filters/main.json
    [debug] readFile: [utf-8] /root/subclean/filters/users.json
    [Info] Language codes matched: .ro.srt,ro
    [Filter] [app] Added 140 items from filter 'main'
    [Filter] [app] Added 54 items from filter 'users'
    [Info] Encoding: utf-8, Language: romanian
    [debug] readFile: [utf-8] Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt
    [Info] Attempting to load language filters: ro
    [debug] readFile: [utf-8] /root/subclean/filters/ro-main.json
    [Info] Save file: Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt
    [Done] No advertising found

[Debug] 93,120 checks [Debug] 194 filters applied [Debug] 480 text nodes [Info] Save file: /root/subclean/logs/latest.txt [Filter] [app] Added 3 items from filter 'ro-main.json'

6. Notice "no advertising found"

**Expected behavior**
I expect the rules to be picked up and used. 

It seems that the `ro-main.json` file is parsed *after* the actual subtitle is done?

**Version:**

subclean --version

You are using subclean@1.8.0


**Additional context**
It seems that any "custom" language file rules are loaded after the subtitle is processed. If I move the rules to `main.json`, they are applied properly:

{ _: [], debug: true, lang: 'ro', w: true, i: 'Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt' } [debug] readFile: [utf-8] /root/subclean/filters/main.json [debug] readFile: [utf-8] /root/subclean/filters/users.json [Info] Language codes matched: .ro.srt,ro [Filter] [app] Added 141 items from filter 'main' [Filter] [app] Added 54 items from filter 'users' [Info] Encoding: utf-8, Language: romanian [debug] readFile: [utf-8] Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt [Info] Attempting to load language filters: ro [debug] readFile: [utf-8] /root/subclean/filters/ro-main.json [Match] Advertising found in node 44 (subtitrari-noi.ro) [Line] ReSincronizare: Agentuoo7 Subtitrări-Noi Team - www.subtitrari-noi.ro [Match] Advertising found in node 479 (subtitrari-noi.ro) [Line] ReSincronizare: Agentuoo7 Subtitrări-Noi Team - www.subtitrari-noi.ro [Info] Removed empty nodes: 44, 479 [Info] Save file: Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt [Done] Removed 2 node(s) and wrote to Stargate SG-1 - S10E20 - Unending Bluray-1080p.ro.srt

[Debug] 93,600 checks [Debug] 195 filters applied [Debug] 480 text nodes [Info] Save file: /root/subclean/logs/latest.txt [Filter] [app] Added 3 items from filter 'ro-main.json'

DrKain commented 6 months ago

Thank you for reporting this error. Could you please upload one of the subtitles that did not work. At a glance it seems to be an error with another package that parses the file but I will take a look when I can. I'm very sick at the moment so I can't provide a fix right away. Sorry for the inconvenience.

Znuff commented 6 months ago

Sure, here's my ro-main.json attempt (very basic):

ro-main.json

And there's a subtitle file:

Stargate SG-1 - S10E09 - Company of Thieves Bluray-1080p.ro.zip

DrKain commented 5 months ago

Should be fixed in the latest update. Thank you for reporting and sorry for the slow reply