funilrys / PyFunceble

The tool to check the availability or syntax of domain, IP or URL.
https://pyfunceble.github.io
Apache License 2.0
299 stars 45 forks source link

unset autocontinue for syntax #188

Closed spirillen closed 3 years ago

spirillen commented 3 years ago

Bug description

Bug found by @ZeroDot1 in https://github.com/funilrys/PyFunceble/issues/134#issuecomment-758083832

@spirillen @funilrys When I set --auto-continue to false it runs like a rocket with the speed of light.

Can confirm that.... it's tremendous speed enhancement actually, even against the DB

This means

By setting autocontinue: False you get some significant speed on --syntax

spirillen commented 3 years ago

Own Ref note

Have some relevance to https://github.com/spirillen/PyFunceble/issues/29

spirillen commented 3 years ago

By setting -c --syntax You actually also more than halves the CPU load..... both tests includes --database-type mariadb

Without -c image

With -c image

ZeroDot1 commented 3 years ago

By setting -c --syntax You actually also more than halves the CPU load..... both tests includes --database-type mariadb

Without -c image

With -c image

I got exactly the same results. :100:

spirillen commented 3 years ago

https://github.com/funilrys/PyFunceble/issues/134#issuecomment-758183890

Hmm haven't got to the end with my test.list yet

spirillen commented 3 years ago

pyfunceble -f test.list --mining --complements --database-type mariadb --ex --wildcard --local -w 30

image

pyfunceble -f test.list --mining --complements --database-type mariadb --ex --wildcard --local -w 30 -c

image

The autocontinue seems rather hungry...

ZeroDot1 commented 3 years ago

I think an option should exist to do a syntax test without a database function. This will be useful for many if you just need an error check.

spirillen commented 3 years ago

If you do pyfunceble --syntax -(u)f $SOURCE_FILE it should not be touching your usual CSV (database) IF I recall it right, it just i use the CDB as default backend for everything (kepping it warm)

Maybe @funilrys can followup on that?

ZeroDot1 commented 3 years ago

That's interesting, when I use -uf all URLs are sorted as invalid.

pyfunceble -w 6 --plain -a --syntax -uf "FILE.txt"

funilrys commented 3 years ago

@ZeroDot1 Please define "URLs".


To your question: User has the choice. If you want to disable any database while using the --syntax argument, simply add the following to your call stack:


To this issue, what if we disable the auto continue subsystem if we are not under a supported engine and if it is not explicitly activated? cc @spirillen @ZeroDot1 @mitchellkrogza

ZeroDot1 commented 3 years ago

@ZeroDot1 Please define "URLs".

amba-node.masterpro.site
amba.abseits.ski
amba.coinlab.biz
amba.easyx.cc
amba.europool.me
amba.masterpro.site
ambalander.coinpoolit.webhop.me
ambassadors.btcprivate.org
amber.abseits.ski
amber.easyx.cc
amber.coin-miners.info
amber.node.coin-miners.info
amber.suprnova.cc
amberhq.evolution-project.go.ro
ambergardensvila6.evolution-project.go.ro
ambil-skinmu.coinpoolit.webhop.me
ambis.biz
amc.abseits.ski
amc.coinpoolit.webhop.me
amc.scryptmining.com
amc.slugmonkeypool.net
amc1.evolution-project.go.ro
amdsruqv.api.binance.com
amdsruqv.dev.api.binance.com
amdsruqv.us.api.binance.com

Example: these URLS were not separated from the invalid ones, no extra file Invalid is created.

See: https://github.com/funilrys/PyFunceble/issues/134#issuecomment-758973622

@funilrys After the third try I can confirm that no file with invalid entries is written. Bildschirmfoto vom 2021-01-12 22-39-05

ZeroDot1 commented 3 years ago

To this issue, what if we disable the auto continue subsystem if we are not under a supported engine and if it is not explicitly activated?

That's a good idea.

Every user should activate auto continue when it is needed.

spirillen commented 3 years ago

What if we disable the auto continue subsystem if we are not under a supported engine and if it is not explicitly activated?

That's actually a two headed coin you got there, it was my initial thought based on my answer (https://github.com/funilrys/PyFunceble/issues/188#issuecomment-758874458)

And the real answer most depend...

  1. What is the most common usage of --syntax without specifying the --database-type
  2. Should the --database-type actually be assigned by default?
    1. :thought_balloon: if I'm a user running the CLI from a shell, what is the expected outcome
    2. stdout?
    3. stdout + csv?
    4. Just a validation of a current $source(-file)
  3. As it is now with the results in the 2.4 + the output directory?

I would need time to give this a thought, but from how we (mypdns + dead-hosts) are using it and will be using it, we should default relay on the --database-type x to be handled by default whatever flags we are assigning a given test, to store the results in our DB for later extraction = keep --inactive-db = False + --continue: True by default (In case the test process got interrupted).

Question: @ZeroDot1 and everyone else: How do you uselessly using pyfunceble?

spirillen commented 3 years ago

On second thoughts, I most admit I have never given it any thoughts that the local csv (previously json) files was touched was running with the --syntax without specifying the db-type, I was/is only concerned about the output dir...

spirillen commented 3 years ago
pyfunceble --dns 192.168.1.1 --database-type mariadb -f https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/hosts.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/mobile.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/snuff.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/strict_adult.txt --local -w 20 -h -ex -a -c

Execution Time: 00:00:10:7.123542

VS

pyfunceble --dns 192.168.1.1 --database-type mariadb -f https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/hosts.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/mobile.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/snuff.txt https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/submit_here/strict_adult.txt --local -w 20 -h -ex -a

hmmm Started this test hours ago.... will update if it reach the end some day...

Never mention the devil.... https://github.com/funilrys/PyFunceble/issues/189#issuecomment-760782984 the db failed right after hitting the "comment" :unamused:

with continue: true
adtgp.com                                                                                            ACTIVE      WHOIS      20-jan-2022       Unknown    AVAILABILITY 
ads.xhamster.com                                                                                     ACTIVE      DNSLOOKUP  Unknown           Unknown    AVAILABILITY 
ads.xhamster.com                                                                                     ACTIVE      DNSLOOKUP  Unknown           Unknown    AVAILABILITY
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry '762352' for key 'PRIMARY'")

image

According to @ZeroDot1 found and my own tests on this issue, I most admit, my opinion is getting closer to entirely ditch this feature in favor of the --dbr https://github.com/funilrys/PyFunceble/discussions/168 The thing is, the db (sql|csv) are keeping the time (datetime) for last tested, let's use this to "continue from" more than x minutes old test.

This issue touches https://github.com/funilrys/PyFunceble/issues/180 https://github.com/funilrys/PyFunceble/issues/189 https://github.com/funilrys/PyFunceble/discussions/168