Ultimate-Hosts-Blacklist / dev-center

The place to talk about our infrastructure or everything related to the Ultimate Hosts Blacklist project.
MIT License
11 stars 2 forks source link

[suggestion] couple new lists for consideration #8

Closed dnmTX closed 5 years ago

dnmTX commented 5 years ago

@funilrys consider adding anudeepND lists @anudeepND is very active player on the hosts blocking scene,he definitely could use the help filtering the lists.

I saw you added DShield lists-Low there are two more if you want to take a look: Medium High Let me know what you think.Thanks

dnmTX commented 5 years ago

@funilrys there are some inconsistencies in @anudeepND's lists.Much bigger number of domains are listed in hosts/ACTIVE: 27358 lines then in clean.list: 23211 lines after the filtering is already done.How is that possible? test

There are missing domains also,at least one that i know off which is pretty much active: imasdk.googleapis.com which is present at domains.list but not in clean.list or hosts/ACTIVE. Another thing i've notice is that the it says that the domain.list was last updated 7 days ago and i can assure you that within that period @anudeepND made several changes in his original lists that didn't make it to the domain.list. For example: s.aolcdn.com was removed on Oct 22, 2018 which is,as of today still present in domains.list

funilrys commented 5 years ago

I would really like to do better not but it may take some time as I have to study, work, live with family and girlfriend and in my free time develop the different tools and system to make @dead-hosts and @Ultimate-Hosts-Blacklist better with the help of a sort of intelligence for the maintenance and monitoring.

Again @dnmTX you have to understand that we are dependent on Travis CI because we are not really ready to invest more that we get for this project. We have more that 50 input sources that have to cleaned and Travis CI can run 5 instance of 10 minutes at the same time. That make some build/test long enough to run for day(s).

Desperate the fact that Travis CI claim to be stable, we often have to get in the middle to restart the build because Travis containers have issue for x or y reasons which are obscure and not dependent from the system and script we wrote.

Back to the problem you mention, you have to understand that we test the file then we get a fresh copy. Indeed, while testing for bigger list (as example) if I will start from the beginning directly because the upstream link was updated, you'll never get a clean.list. That means that if Travis has an obscure bug, we wait 24h before continuing were we stopped because the cronjobs cannot be programmed (under Travis CI) for a time under 24h. There is indeed a way to bypass that but it has to be an external system which run in one of our machine (which run 24/24/7). That's what I actually do on my free time after the development for roadmap and objectives in other OSP I maintain or contribute.

I'll do my best to find why there is so much inconstant in the build history and will come back to you around the weekend.

I agree that we should be more proactive but it will mean that we are monitoring the whole system 24/24/7. Which is quiet impossible for us (@mitchellkrogza and I) that's why I'm working on tool to do the monitoring and maintenance automatically. That tool will not only monitor and maintain @Ultimate-Hosts-Blacklist or @dead-hosts, but also all of the repository we maintain (again @mitchellkrogza and I) which run with PyFunceble and Travis CI. That's why you may have to wait a bit.

If you can't live because of those limitation, I'll then invite you to download PyFunceble, deactivate de usage of the WHOIS record if you use it for your personal usage and test the list directly for your need.

Otherwise, please be patient it will be better over time, we are just at the beginning of one of our biggest adventure.

Thanks again for the issue reporting.

Cheers, Nissar

dnmTX commented 5 years ago

@funilrys just a heads up on the inconsistencies that i've mentioned in anudeepND's lists.Looks like other lists are affected too.For example; https://github.com/Ultimate-Hosts-Blacklist/justdomains_mirror1.malwaredomains.com. Mine as well check them all when you have the time.Thank you!

funilrys commented 5 years ago

It was definitely a Travis CI issue @dnmTX. Indeed, our script was working perfectly but Travis CI was unable to commit push the final commit for an unknown reason which causes in most of our input an endless loop as the test was finished but the commit wasn't pushed.

The system tried to push again, again and again until it was working. In the middle some days past which caused the issues you mentioned.

I'll generate a new GitHub token for each input source and restart all of them from the beginning (manually) asap and monitor them manually for some hours to confirm that the issue is now gone.

Cheers, Nissar

dnmTX commented 5 years ago

Thanks @funilrys. let's hope it's a easy fix as i was planning to load couple more lists from here but i'll hold until you get it fixed.

funilrys commented 5 years ago

Restarting process :heavy_check_mark: Let's monitor now!

funilrys commented 5 years ago

@dnmTX FYI, after monitoring of the first 30 input sources who finished nothing reported :+1:

dnmTX commented 5 years ago

@funilrys great.I'll keep an eye on them during the week just in case and if anything i'll let you know.

dnmTX commented 5 years ago

@funilrys still happening in anudeepND's list. Filtering is done and there are 26836 lines in hosts/ACTIVE and 24127 lines in clean.list. Just letting you know as i promised,that's all.

funilrys commented 5 years ago

@dnmTX ==> #12 :smile_cat:

dnmTX commented 5 years ago

@funilrys i think justdomains is stuck on filtering.It shows that it's still under test but the last commit was two days ago and the lists are not being updated.Just reporting as i'm not sure you're aware or not.

funilrys commented 5 years ago

Hi, @dnmTX thanks for pointing out. I traced the error and I can tell that it is because of the cross-repository configuration which was not updated.

It's now fixed and I added it to my internal workflow for checking and improvements.

Indeed, the system has a cross-repository configuration file which is located here https://github.com/Ultimate-Hosts-Blacklist/repository-structure/blob/master/.PyFunceble_cross_input_sources.yaml.

What we actually do is:

What did go wrong:

How can we ensure that it will not happen again:

Have a nice day/night.

Cheers, Nissar