AdguardTeam / cname-trackers

This repository contains a list of popular CNAME trackers
https://adguard.com/
MIT License
387 stars 37 forks source link

Please provide this list in RPZ format #13

Closed ppaeps closed 3 years ago

ppaeps commented 3 years ago

This list is very useful; many thanks!

It would be even more useful if it were in RPZ format. This (horrible) awk hack does exactly that:

awk '{ if ($0 !~ /^#/) print $0 " CNAME ." ; else { sub(/#/, ";"); print $0 }}' combined_disguised_trackers_justdomains.txt

Please consider providing an RPZ version of this list in addition to the AdBlock formatted version.

dnmTX commented 3 years ago

@ppaeps sed will do this in a instant on your end. No need to bother the @AdguardTeam fellas for that 😉 : sed -i '/^[^#]/s/$/ CNAME \./g' combined_disguised_trackers_justdomains.txt Result(partial sample):

# Company name: A8.net
#
# a8.net disguised trackers
#
a8.01cloud.jp CNAME .
a8.cyclemarket.jp CNAME .
a8.denwa-hikari.com CNAME .
a8.eonet.jp CNAME .
a8.haptic.co.jp CNAME .

👍

ppaeps commented 3 years ago

Hah! You are not wrong. sed would be easier than awk. I put together that one-liner from a larger (more gruesome) awk script that converts and combines a bunch of blocklists into an RPZ for me. :)

Since the nice folks of @AdguardTeam are already providing a list in AdBlock format, it shouldn't be too much of a hardship for them to add the additional one-liner to produce RPZ too. More formats, more better. The easier we can make it for everyone to block these cretins, the less incentivised more cretins will be to join the party and the slower these lists grow.

dnmTX commented 3 years ago

@ppaeps as i'm really not familar with RPZ format i see in your script you're substituting # with ;. Is that how comments are marked there? I agree with you about your request here,more formats are allways better 👍

TPS commented 3 years ago

An example in the RFC @ https://tools.ietf.org/id/draft-vixie-dnsop-dns-rpz-00.html#rfc.appendix.A does show such commenting.

dnmTX commented 3 years ago

Thanks @TPS 👍 For some reason i couldn't find it. Anyway,here is reworked final sed command(in case is needed): sed -i '/^#$/d; s/^#/;/g; /^[^;]/s/$/ CNAME \./g' combined_disguised_trackers_justdomains.txt

Result(partial sample):

; Title: AdGuard CNAME disguised trackers list
; Description: The list of trackers that disguise the real trackers by using CNAME records.
; Homepage: https://github.com/AdguardTeam/cname-trackers
; Company name: A8.net
; a8.net disguised trackers
a8.01cloud.jp CNAME .
a8.cyclemarket.jp CNAME .
a8.denwa-hikari.com CNAME .
a8.eonet.jp CNAME .
a8.haptic.co.jp CNAME .
a8.lavie-official.jp CNAME .
a8.lens-labo.com CNAME .
ppaeps commented 3 years ago

@dnmTX You'll want to add a ;/^$/d;$G or something similar (;$a\, ;$s/$/\'$'\n'$'/, depending on your mood) to ensure there's a newline at the end of the output file. Some DNS servers (e.g. NSD) will cry about files that don't end with a newline.

This is all academic though. I just submitted pull request #14 which generates the RPZ formatted files directly. It would be great if @AdGuardTeam could merge this. 🙏

dnmTX commented 3 years ago

Philip @ppaeps your wish came through(congrats 👏 😄 ) so you can close the issue here if you want 👍

ppaeps commented 3 years ago

Hooray! Thanks for the merge @adguard!

alsyundawy commented 3 years ago

what diff use IN and without IN in cname sample a.b.c.d CNAME . a.b.c.d IN CNAME .

alsyundawy commented 3 years ago

This list is very useful; many thanks!

It would be even more useful if it were in RPZ format. This (horrible) awk hack does exactly that:

awk '{ if ($0 !~ /^#/) print $0 " CNAME ." ; else { sub(/#/, ";"); print $0 }}' combined_disguised_trackers_justdomains.txt

Please consider providing an RPZ version of this list in addition to the AdBlock formatted version.

How about this command?

awk '{print $1" IN CNAME ."}' combined_disguised_trackers_justdomains.txt >> combined_disguised_trackers_justdomains.zone

ppaeps commented 3 years ago

No difference. Check section 5.1 of RFC 1035. "Omitted class [...] default to the last explicitly stated values". In practice, every DNS implementation I've encountered, defaults them to IN even when they don't appear anywhere in a zone file.