funilrys / PyFunceble

The tool to check the availability or syntax of domain, IP or URL.
https://pyfunceble.github.io
Apache License 2.0
289 stars 44 forks source link

BUG: https://secure.fanboy.co.nz/fanboy-cookiemonster_ubo.txt unable to scan list #327

Closed ryanbr closed 1 year ago

ryanbr commented 1 year ago

Description

Running docker run -it pyfunceble/pyfunceble-dev --adblock -f https://secure.fanboy.co.nz/fanboy-cookiemonster_ubo.txt > elcookcheck.txt causes pyfunceble to exit after a few seconds with an incomplete list

Expected behavior

Screenshots

Versions

PyFunceble 4.2.0a7.dev (Blue Duckling: Ixora)

OS: Linux WSL2 (Ubunutu)

Python Version: 3.8.10

PyFunceble Version:

Not showing any errors here

Additional context

funilrys commented 1 year ago

Hi @ryanbr ,

sorry for the waiting time. Is this still happening? If yes, can you give me your docker version?

I tested with Docker version 20.10.23, build 715524332f and I didn't get any error. Is there anything useful to troubleshoot into the elcookcheck.txt file ?

I wish you all the best in this New Year! Nissar

ryanbr commented 1 year ago

Just exit mid parse. Using Docker 4.16.3 (Windows). Is there a switch I can add to give more debug details?

...
^[[0m^[[30m^[[42mrp.pl                                                                                                ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42meffekt.it                                                                                            ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42mnowehoryzonty.pl                                                                                     ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42mremixshop.com                                                                                        ACTIVE      DNSLOOKUP ^[[0m
^[[0m^[[30m^[[42m4hifi.pl                                                                                             ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42msycow.pl                                                                                             ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42mprideandglory.pl                                                                                     ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42mtmgrup.com.tr                                                                                        ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[42mprivacyportal.fatergroup.com                                                                         ACTIVE      DNSLOOKUP ^[[0m
^[[0m^[[30m^[[42mskryptcookies.pl                                                                                     ACTIVE      WHOIS     ^[[0m
^[[0m^[[30m^[[46mkayak.                                                                                               INVALID     SYNTAX    ^[[0m
^[[0m^[[30m^[[46mvinted.                                                                                              INVALID     SYNTAX    ^[[0m
^[[0m

Status      Percentage   Amount
----------- ------------ ------------^[[0m
^[[0m^[[1m^[[32mACTIVE      98%          418         ^[[0m
^[[0m^[[1m^[[31mINACTIVE    2%           1           ^[[0m
^[[0m^[[1m^[[36mINVALID     2%           10          ^[[0m
^[[0m
^[[32m^[[1mThank you for using PyFunceble!^[[0m^[[0m
^[[0m^[[0m
funilrys commented 1 year ago

@ryanbr

I think I know what's happening ... I didn't look at the original link so it seemed to be working as I expected but not you ...

Now let me explain: If I let it run like this:

$ docker run -it pyfunceble/pyfunceble-dev  --adblock -f https://secure.fanboy.co.nz/fanboy-cookiemonster_ubo.txt > elcookcheck.txt

I get the following output - after a few minutes (last 20 lines):

ookies.forbes.pl                                                                                    ACTIVE      DNSLOOKUP
remixshop.com                                                                                        ACTIVE      DNSLOOKUP
efilli.com                                                                                           ACTIVE      DNSLOOKUP
matichon.co.th                                                                                       ACTIVE      DNSLOOKUP
consent.themeteocompany.com                                                                          ACTIVE      DNSLOOKUP
tmgrup.com.tr                                                                                        ACTIVE      WHOIS
wongnai.com                                                                                          ACTIVE      DNSLOOKUP
meteorete.it                                                                                         ACTIVE      WHOIS
cookies.unidadeditorial.es                                                                           ACTIVE      DNSLOOKUP
skryptcookies.pl                                                                                     ACTIVE      WHOIS

Status      Percentage   Amount
----------- ------------ ------------
ACTIVE      97%          416
INACTIVE    2%           3
INVALID     3%           10

Thank you for using PyFunceble!

That's the expected behavior.... It seems that the original link uses some "aggressive" rules that PyFunceble doesn't decode by default.

In fact, if I instruct PyFunceble to use the aggressive decoding mode with the --aggressive argument; like this:

$ docker run -it pyfunceble/pyfunceble-dev  --adblock --aggressive -f https://secure.fanboy.co.nz/fanboy-cookiemonster_ubo.txt > elcookcheck.aggressive.txt

It runs and runs, and I'm still waiting for the result.

That's probably what you expect. Can you confirm?

Thank you for your patience.

Have a nice day/night! Nissar

funilrys commented 1 year ago

Note: Here is what I get when I run with the aggressive decoding (last 20 lines):

festool.ru                                                                                           ACTIVE      DNSLOOKUP
festool.ua                                                                                           ACTIVE      WHOIS
festool.sk                                                                                           ACTIVE      DNSLOOKUP
pepper.it                                                                                            ACTIVE      DNSLOOKUP
pepper.it                                                                                            ACTIVE      DNSLOOKUP
festoolcanada.com                                                                                    ACTIVE      DNSLOOKUP
pepper.it                                                                                            ACTIVE      DNSLOOKUP
pepper.it                                                                                            ACTIVE      DNSLOOKUP
festool.it                                                                                           ACTIVE      DNSLOOKUP
festoolusa.com                                                                                       ACTIVE      DNSLOOKUP

Status      Percentage   Amount
----------- ------------ ------------
ACTIVE      100%         27080
INACTIVE    0%           100
INVALID     0%           12

Thank you for using PyFunceble!
Do you have a feedback, an issue or an improvement idea? Let us know on GitHub!
ryanbr commented 1 year ago

Ah yeah --aggressive works, why isn't it default? Exiting after x lines seems strange

funilrys commented 1 year ago

It has some "historical" reason. At the time I wrote that decoder, I wanted to be as close as possible to https://adblockplus.org/filter-cheatsheet. I wanted to actually extract what is blocked and not everything that is part of a selector. And that's what the default mode is all about.

Everything that's within the aggressive mode were wishes from several people who wanted to do some extending tests of their AdBlock filter lists ...

Maybe it's time to rethink the UX or rethink it ...

ryanbr commented 1 year ago

I guess while I'm here; @funilrys

oixohmve.com INACTIVE STDLOOKUP

https://www.whois.com/whois/oixohmve.com seems current?

funilrys commented 1 year ago

Hi @ryanbr it should be working. It seems that I underestimated the Caching power of GitHub.

GitHub was caching a wrong version of the https://github.com/PyFunceble/iana/blob/master/iana-domains-db.json file, which caused PyFunceble do not to know which server to contact for all .com domains.

It is fixed now. As soon as I get a bit of time, I'll rent and move all critical files to a dedicated server (while keeping GitHub as last resort).

Thank you for your patience and support.

Stay safe.

P.S.: Sorry for the waiting time. Last weeks have been hard. I'm also working on a project which should let us efficiently orchestrate tests of hosts/blocklist against the principles of pyfunceble. It's probably one of my biggest project so far and I can't wait to finish and present it to you (and other sponsors) as an early POC.