KBlixt / subcleaner

removes ads from subtitle files cleanly.
288 stars 13 forks source link

Issue with regex (i think) #22

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hello, i have a problem with using regex properly, hopefully you can help me out.

First, i was trying everything on my windows machine and it removed lots of Dutch ads (i manually put in regex8) when i tried to do a couple dry-runs.

Then i transferred the same files to my Synology NAS/Linux machine gave all the permissions for it and added it to Bazarr.

Then i watched Portainer's logs for Bazarr and it said that the script ran successfully but it reported that it removed 0 blocks. When i looked in some files the ads were still there and then i dragged those .srt files back to my windows machine and tried to run the same script against it and it removed those lines.

I have been away so i can't exactly reproduce this anymore, i have been messing around with the logs as well because there were still old entries not from the Linux machine so i removed the logs..

I think the best thing i can do is show my global.conf file.

[PURGE_REGEX]

regex1: ([^Ã]|^)©|™|tvsubtitle|\b(YTS|YIFY)\b|opensub|sub(scene|rip)|podnapisi|addic7ed|Camikaze
regex2: bozxphd|sazu489|psagmeno|normita|anoxmous|9unshofl|BLACKdoor|titlovi|Danishbits|hound\.org|hunddawgs
regex3: jodix|LESAIGNEUR|HighCode|explosiveskull|GoldenBeard|nessundorma|Fingal61|dawaith|MoSub|srjanapala
regex4: FilthyRichFutures|celebritysex|shareuniversity|AmericasCardroom|saveanilluminati|MCH2022|ALLIN1BOX
regex5: admitme|argenteam
regex6: \.(tv|tk|xyz|io|sex|porn|xxx|link)\b|https?[:\.\/\\ ]
regex7: (Someone(\b.\b)?needs(\b.\b)?to(\b.\b)?stop(\b.\b)?Clearway(\b.\b)?Law)|(Public(\b.\b)?shouldn't(\b.\b)?leave(\b.\b)?reviews(\b.\b)?for(\b.\b)?lawyers) 
regex8: bierdopje|nlsubs|subtitles searcher|ondertiteling:|verbetering:|gedownload:|vertaling & sync:|vertaling:|== sync|==sync|ondertiteld door:|sync:|synced:|sync and corrections by|sync &|aangeboden door:|ondertiteling swell|captioning sponsored by|rip en sync
regex9: BluRay Rip:|Bdzzld

I might have just made some mistakes here I'm just began 'learning' how to use regex and i noticed it wanted to delete a normal line in Dutch as well:

Een week voor de noodtoestand in Californië

Which is just a normal line in the .SRT but i know this issue is probably from my own Regex8.

So in short; Began trying it on Windows with good results, added it to Bazarr on Linux and it seems to be running but not removing all the blocks i configured manually then i have been away and now i haven't really had a change to find a good sample to post.

I think the best course of action is fixing the Regex line first and go from there.

I know this isn't a very well written issue report i would have liked it to be cleaner, but it's at least a start I'll provide anything you need.