DandelionSprout / adfilt

The place where I, DandelionSprout, store my web filter lists for countless topics, including my Nordic adblock list. As simple as that, really.
Other
1.3k stars 143 forks source link

Invalid domains detected on pihole: #866

Closed pallebone closed 11 months ago

pallebone commented 11 months ago

Hi there,

Pihole reports invalid domains:

[i] Target: https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareABP.txt [✓] Status: Retrieval successful [✓] Parsed 0 exact domains and 13421 ABP-style domains (ignored 13635 non-domain entries) Sample of non-domain entries:

Possible error on list?

Kind regards Peter

DandelionSprout commented 11 months ago

Pi-hole users should use the https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareDomains.txt list version. The "target" list version in the OP is specifically for Adblock Plus and AdBlock.

pallebone commented 11 months ago

Pihole supports abp style lists now. Are you saying this is not a parsing error and pihole has a bug?

I can open a bug with pihole if this format is valid.

How is this parsing supposed to work? Not clear.

DandelionSprout commented 11 months ago

Based on this info (I'd say it's good news that Pi-hole has begun to support ||-type lists again, at least), it seems like Pi-hole hasn't accounted for $domain stuff yet.

In the meantime, you can try out https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareAdGuardHome.txt, which is a ||-type list version without a whole lot of $ values.

pallebone commented 11 months ago

Thank you. Your time is appreciated.

I will take this information to pihole team and open a bug report and see what they say and report back.

Kind regards Pete

iam-py-test commented 11 months ago

In the meantime, you can try out https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareAdGuardHome.txt, which is a ||-type list version without a whole lot of $ values.

Off topic, but ||google-analytics.com^$sdi-tool.org in https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareAdGuardHome.txt looks like a mistake. Given domain isn't supported at the DNS level, I don't think GA should be included at all. A few other weird conversions:

/\.(xyz|pics)/[a-zA-Z0-9]{130,}/script,subdocument,image
/\.(cloudfront\.net|xyz|pics)/[a-zA-Z0-9]{20,}/[a-zA-Z0-9]{25,}\+[a-zA-Z0-9+]{90,}$/script,subdocument,image
||cbox.ws^$chicks.cam
/beep.mp3^$xyz|club|pro
DandelionSprout commented 11 months ago

(To iam-py-test) I am currently looking into other odd list entry conversions as we speak, and hope to be able to handle most of them within 20min.

pallebone commented 11 months ago

Im being asked this question by the pihole team/mod when asking about these entries:


"jfb Moderator

3m

Sample of non-domain entries:

    "||ga^$domain=~google.ga|~filtri-dns.ga|~dgdi.ga|~9191.ga|~animevsub.ga|~my.ga|~shorturl.ga|~mmo.ga""

How should this entry resolve to a domain or wildcard block in Pi-hole?"


Can you help me understand how this entry is supposed to be parsed? Im not clear either.

Kind regards Peter

iam-py-test commented 11 months ago

If I understand it correctly, it should block anything with the Top Level Domain of ga, but will not block certain legitimate domains (i.e. google.ga)

pallebone commented 11 months ago

You are the best thank you.

pallebone commented 11 months ago

Well I logged 2 reports or "feature requests" with the pihole team, and had a disappointing interaction. It seems their lead developer Dan Schaper isnt really interested in the ABP format even though it was added to pihole.

His comment was simply: "The ABP functionality of Pi-hole is extended as far as it will be extended. No new features or changes will be added.

https://pi-hole.net/blog/2023/03/22/pi-hole-ftl-v5-22-web-v5-19-and-core-v5-16-1-released/ explains the sole ABP entry format that we will support."

On that block it basically just states that it reads an ABP list and converts it to a single domain anyway then blocks that and any subdomains. Nothing more. I wasnt aware of this and now I am confused as to why pihole even said it supported ABP when it just uses the list as a domain only list anyway.

On another request I opened he also just basically said they are uninterested in changes in their product so now Im wondering if pihole is the best product to use. Im wondering if I should just google for a different product instead.

Since this is the state and the developer has stated there will be no changes to the decision I cant really ask you to look further into this issue I raised so I will close it.

Thank you for your help, it was appreciated. I apologize for not being able to make any progress and essentially wasting your time.

Kind regards Peter

pallebone commented 11 months ago

Closing, cannot fix.

DandelionSprout commented 11 months ago

(…) so now Im wondering if pihole is the best product to use. Im wondering if I should just google for a different product instead.

https://github.com/AdguardTeam/AdGuardHome used to be the king of the hill of DNS blockers from 2020-22, but hasn't seen a whole lot of development recently either.

I've admittedly had some scepticism about Pi-hole's devs in the sense that they have never ever communicated with the wider adblocker community.

pallebone commented 11 months ago

Interesting. What do you use?

pallebone commented 11 months ago

Also fyi this list: "In the meantime, you can try out https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareAdGuardHome.txt, which is a ||-type list version without a whole lot of $ values."

Was not suitable either as it had a few values that did not work. I have had to revert to https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt

Instead :)

P

DandelionSprout commented 11 months ago

If Pi-Hole discards entire lists if even 1 (one) entry is unsupported by it, then that would be absolute lunacy on the Pi-Hole devs' part.

I'd recommend switching to AdGuard Home nevertheless.

pallebone commented 11 months ago

It doesnt discard the entire list to be fair. But it does have a limit to only showing 5 invalid entries, making it impossible to log a correction if more than 5 exist and 5 are valid for another project.

Im conflicted what to do, Pihole has served me well but at the same time its direction is unclear to me and was surprised by the reception I received.

AdGuard Home is I guess the most popular other option from what I have read. It would be a lot of work to change over so Im in some ways stuck. Not sure what to do. Your recommendation seems reasonable if they have been a positive choice for you.

Do the same lists still work - ie domain only lists or does it have to be in agh format only? Not sure how I would transition if I lost all my lists I have setup.

In addition 2 features I use with pihole is the ability to view invalid entries when it updates - unsure if adh has a similar output, and other option is the ability to have 2 piholes and sync them for redundancy. Again unsure exactly if this is supported by adh.

Very confused rn.

Kind regards Peter

pallebone commented 11 months ago

Can I also ask if IP's should be on that domain/hosts list?

[i] Target: https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt [✓] Status: Retrieval successful [✓] Parsed 13542 exact domains and 0 ABP-style domains (ignored 7 non-domain entries) Sample of non-domain entries:

iam-py-test commented 11 months ago

Imre, You could add this to prepare_hosts:

        # https://www.geeksforgeeks.org/how-to-validate-an-ip-address-using-regex/
        line = re.sub(
            r"^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$",
            r"",
            line
        )

Though this only works for IPv4

DandelionSprout commented 11 months ago

I remember that I left such entries as-is for the past 2 years, in case a fringe scenario occured in which any users tried using the Hosts list version in a tool that supported IP blocking. Such tools turned out gradually over the following years to be very few in number, and one of the few that existed (AdGuard Home) had a much superior || list version included in the tool as opt-in.

I'll need consideration time on how to handle this.

Im conflicted what to do, Pihole has served me well but at the same time its direction is unclear to me and was surprised by the reception I received.

I find AdGuard Home to retain a slight lead, due to better UI, more included lists, and that it also supports Windows and macOS. Just note that it can have some glitches from time to time, especially with the settings for encrypted DNS.

pallebone commented 11 months ago

I will look into adguard home. Will take me quite a lot of effort and time to change but will try it :)