Ultimate-Hosts-Blacklist / dev-center

The place to talk about our infrastructure or everything related to the Ultimate Hosts Blacklist project.
MIT License
11 stars 2 forks source link

[Feedback] Friday, February 21, 2020 - Today #44

Closed funilrys closed 4 years ago

funilrys commented 4 years ago

Hello, World!

Please report here every infrastructure/build/test issues (suspected or confirmed).

Note: Issue that implies big changes or not related to infrastructure or builds/tests are welcome as a new issue!

Cheers!

dnmTX commented 4 years ago

Thanks for working on it(looking at the commits above).Looks like my lists will get updated on time,every time 😉

funilrys commented 4 years ago

@dnmTX Was in my plan :wink: Just accelerated by the central repository not updating since a few day :smile_cat:

Also, note the last commit somewhere ... You may ask me in the future why the content changes. That commit explain it all 😊

funilrys commented 4 years ago

Central repository back. Synchronization of the search engine manually started (should be done in the coming 30 minutes).

Note: It's a bit late, so do not except the ip.list to be in the search engine. I will do that later in the coming weeks.

We are getting bigger and bigger, so a lot has to be done. But the most important is done.

Coming up in the coming days/weeks:

Time for me to sleep.

Catch up later!

Cheers!

funilrys commented 4 years ago

Forgot to link: https://github.com/Ultimate-Hosts-Blacklist/dev-center/commit/49876fbd985b2bbc5646c637739b3a27e44bf4f2

I completely rewrote the input source updater.

funilrys commented 4 years ago

Canceled all builds

funilrys commented 4 years ago

Restarted all canceled builds.

funilrys commented 4 years ago

Hey @Ultimate-Hosts-Blacklist/whitelister @Ultimate-Hosts-Blacklist/blacklister,

please beware, my last commit introduced a change in the synchronization of the distributed files.

Here is an image of it: ===> Upstream delete/add some subjects.

  1. When a CI instance (test) start, we update domains.list.
  2. We get the list of removed subjects (previous upstream - domains.list).
  3. We remove the removed subjects from clean.list.
  4. We remove the removed subjects from whitelisted.list.
  5. We remove the removed subjects from volatile.list.
  6. Run test
  7. Test (part) finished ==> Push changes

Stay safe and healthy! Nissar

Somebodyisnobody commented 4 years ago

I have a few questions, maybe it helps me to unterstand the whole system:

dnmTX commented 4 years ago

@Somebodyisnobody let me try to answer as much as i can and @funilrys will finish or correct what i got wrong. So...

domains.list: is basically the raw unfiltered upstream list it's just the name is different. After the domains.list is filtered,the result creates this 👇

clean.list: well..there are some active domains(yeah, @funilrys ,i said active 😛) that for some reason when they're filtered showing status 404 so @funilrys decided not to include them.

whitelisted.list: is the clean.list + it's filtered against our whitelist.

volatile.list: well that's the cream of the crop 😉 .It contains ALL the ACTIVE domains from upstream and it's also filtered against our whitelist as well.

P.S. @funilrys you got it from here 👍

funilrys commented 4 years ago

Hello and sorry for the delay.

I don't have anything to add @dnmTX :smile_cat:

I had some time to create a flow chart (Also included in the README of this repository)!

Have fun:

UHB Backend

Let us know if something is not clear @Somebodyisnobody 😊

Somebodyisnobody commented 4 years ago

Uff what a large construct :) I don't know for what the arrows' colours stand for

Let's assume I edit a domains.list in a repo for example https://github.com/Ultimate-Hosts-Blacklist/blacklist:

domains.list diffs:

 bad-website
-i.am.a.good.site
+verybadwebsite

This domains.list will be checked for removals and additions in the brown field "Splitter". After that the i.am.a.good.site would be removed from

  1. clean.list
  2. whitelisted.list
  3. volatile.list if it's contained there.

At this point my brain begins to fume and the air smells not very good: Why should I remove the domain i.am.a.good.site, which I removed from blacklist, from the whitelist? Why should i.am.a.good.sitebe on the blacklist AND on the whitelist? Do I think completely wrong?

funilrys commented 4 years ago

Uff what a large construct :) I don't know for what the arrows' colors stand for

White and purple are for data flow.

white ==> Outgoing (data storage to process/action) ==> e.g. A (to) B purple ==> Incomming (process/action to data storage ) ==> e.g. B (to) A

Where B is a process and A a data storage.

Black is for normal process flow (normally 1 after another or once everything in the group is done).

Let's assume I edit a domains.list in a repo for example https://github.com/Ultimate-Hosts-Blacklist/blacklist:

Strange that repo is not working behind CI :thinking: Added to my backlog.

Why should I remove the domain i.am.a.good.site, which I removed from the blacklist, from the whitelist?

In a working environment, the system will do it by itself. The file which is named whitelisted.list is not a whitelist list. It's just a file which is the copy of clean.list (again in a working CI environment) + our whitelisting tool applied to it.

In another word: whitelisted.list == clean.list - (Ultimate-Hosts-Blacklist/whitelist/domains.list)

If you remove from the blacklist, the system should remove it from the other files automatically.

It might be disturbing indeed, I made a mistake when naming that file in the past.

Let me know if what I explain is still not clear.

Stay safe and healthy! Nissar

Somebodyisnobody commented 4 years ago

Okay I understood that thing with the whitelisted.list but it's a large deployment you've set up. I have not yet fully understood all this but let's just leave it at that for now

funilrys commented 4 years ago

@Somebodyisnobody thats because the other part is missing: Central repository deployment 😁

dnmTX commented 4 years ago

@funilrys this looks like a serious issue(see https://github.com/mitchellkrogza/Phishing.Database/issues/29#issuecomment-633290431). I just noticed that it's present in other repo lists. The domains that were already removed by upstream and are not present in the domains.list for some reason all of them exist in the other lists(volatile.list etc.). Please inspect this and fix asap.Thank you 👍

funilrys commented 4 years ago

Hi @dnmTX thanks for pinging me :-) I'm sorry I'm very busy lately between private and professional life. I will look at this ASAP.

Thanks again for everything!

funilrys commented 4 years ago

Service Note: Removed all distributed file (except domains.list) from https://github.com/Ultimate-Hosts-Blacklist/Phishing.Database.

funilrys commented 4 years ago

@dnmTX, I wrote the code that also cleans all files (from output/) which are used to generate what we distribute.

For every other simular issue, simple force the regeneration (Commit message with Launch test (case sensitve) if it is a short one (number of builds from previous_stats in info.json).

Otherwise, delete {clean,ips,volatile,whitelisted}.list and create a new commit with Launch test in it (case sensitive). It will force the regeneration of everything.

And in the middle, if the {clean,ips,volatile,whitelisted}.list does not exists, the central repository will fetch domains.list.


Side note: As the whois_db.json is already good implemented, I increased the excution time of an instance from 10 to 15 minutes. If it work great, we might be able to increment it to 20 minutes.

This will let the system test more as it got 5 more minutes.

Have a nice day/night. Stay safe and healthy.

Nissar

dnmTX commented 4 years ago

I wrote the code that also cleans all files (from output/) which are used to generate what we distribute.

This is great and how it should've been from the get go if you ask me. Thanks for fixing this @funilrys ,much appreciated. I'll keep an eye on it for now and will let you know if anything. Hopefully those files will get generated without any hiccups. Stay safe 👍

dnmTX commented 4 years ago

Closing this and opening new one for FEEDBACK