Ultimate-Hosts-Blacklist / dev-center

The place to talk about our infrastructure or everything related to the Ultimate Hosts Blacklist project.
MIT License
11 stars 2 forks source link

[Feedback] New infrastructure #32

Closed funilrys closed 4 years ago

funilrys commented 5 years ago

Hello, World!

Please report here every infrastructure/build/test issues (suspected or confirmed) from now.

Note: This is issue is only about something which is not normal with our new deployement logic. Everything else should be reported as a new issue.

Cheers, Nissar

funilrys commented 5 years ago

First Issue:

Exception: Unable to get the content of 'https://raw.githubusercontent.com/Ultimate-Hosts-Blacklist/repository-structure/master/.travis.yml'.

It might be GitHub who blocks the machine as I'm able to access to https://raw.githubusercontent.com/Ultimate-Hosts-Blacklist/repository-structure/master/.travis.yml from my machine :thinking:

I'm restarting those boxes to see what happens.

dnmTX commented 5 years ago

🤔

funilrys commented 5 years ago

Turn out it was an issue from my side. I forgot a return statement. That error did not happen in test phase because the repository-structure repository is the central one so it does not need to update .travis.yml.

Published to pip. Restarting all boxes who got the issue manually.

funilrys commented 5 years ago

Next one:

error: unknown option `procelain'

Also related to the update of .travis.yml. I might have use the wrong function. I'm looking for the fix.

funilrys commented 5 years ago

Turn out it is a typo. I used --procelain instead of --porcelain.

Published to pip. Restarting all boxes who got the issue manually.

funilrys commented 5 years ago

Next one:

Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.7.1/bin/ultimate-hosts-blacklist-input-repo-updater", line 10, in <module>
    sys.exit(_command_line())
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ultimate_hosts_blacklist/input_repo_updater/__init__.py", line 93, in _command_line
    logging_level=logging_level, multiprocessing=arguments.multiprocess
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ultimate_hosts_blacklist/input_repo_updater/core.py", line 840, in process
    self.__process_multiprocess(to_test, end_time)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ultimate_hosts_blacklist/input_repo_updater/core.py", line 604, in __process_multiprocess
    continue_data = List(continue_data).merge(data, strict=False)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ultimate_hosts_blacklist/helpers/list.py", line 144, in merge
    result.append(element)
AttributeError: 'dict' object has no attribute 'append'

This time it's just before pushing everything to the repository. It is related to the new continue subsystem which is specially implemented for our infrastructure. I'm looking at the code.

funilrys commented 5 years ago

Turns out It was a naming issue. Indeed I used List(xx).merge(xxx) instead of Dict(xx).merge(xxx) because we were working with python dict() instead of python list(). Both List() and Dict() classes are part of our helpers.

Published to pip. Manually restarting all boxes who got the issue.

funilrys commented 5 years ago

@dnmTX I'm going to sleep but a brunch of repositories already finished their test and generation.

Catch you later. Bye

dnmTX commented 5 years ago

@funilrys looks like https://github.com/Ultimate-Hosts-Blacklist/Spotify-Ad-free doesn't exist anymore and the build failed because of it( some kind of a DMCA takedown).

Also i see some of the builds failed for other reasons and on some that passed the volatile.list and whitelisted.list are not updated. Please check tomorrow when you have time.Good night.

dnmTX commented 5 years ago

just checked the volatile.list in lightswitch05's repo and domains(active for sure) from *.doubleclick.net were not added(some are there but many are missing).For example: 1044889.fls.doubleclick.net 1063127.fls.doubleclick.net 1092360.fls.doubleclick.net 1095311.fls.doubleclick.net 1106306.fls.doubleclick.net 1119706.fls.doubleclick.net 1181183.fls.doubleclick.net 1268437.fls.doubleclick.net 1272738.fls.doubleclick.net

Also the clean.list and the volatile.list contain the same number of domains which tells me that something is definitely wrong.

Not sure but by the look of it the www. was not applied to each domain in the domains.list before filtering.

funilrys commented 5 years ago

volatile is priority. I'm checking that right now.

funilrys commented 5 years ago

@dnmTX @mitchellkrogza Just a feedback on the issue regarding the volatile. My last commit should fix it in the future.

But when looking at logs I found out that the DNS server under the container do not resolve correctly for some domains. Which caused PyFunceble to return INACTIVE.

For about 2 hours I started to write the code which will let us use different DNS server than the one installed. The code is almost ready. Will be out in the coming hours (at least in the 2.x.x branch).

I need some fresh air so catch you later.

dnmTX commented 5 years ago

Great. Thanks @funilrys 👍 Just curious,which DNS server was implemented initially?

funilrys commented 5 years ago

The OS installed one so in our case under Travis CI (cf):

$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 169.254.169.254
search c.eco-emissary-99515.internal google.internal
dnmTX commented 5 years ago

Yeah....that IP....very suspicious.Basic check showed that some commercial domains are associate with it. Better change it.Even Google(8.8.8.8) is better then that.

funilrys commented 5 years ago

Yeah, that's also my conclusion when I saw that. Unfortunately, Travis CI don't allow any changes to that file (cf) ...

$ sudo printf "nameserver 1.1.1.1\nnameserver 1.0.0.1\n" > /etc/resolv.conf
/home/travis/.travis/functions: line 104: /etc/resolv.conf: Permission denied

That's why I had to edit PyFunceble for the job ... We lose a bit in performance but it's balanced with the new multiprocess usage which allows us to simultaneously test as many as possible before 15 minutes is reached.

dnmTX commented 5 years ago

I saw that and also i think there is a typo: nnameserver 1.0.0.1\n" double n.

@funilrys let's worry about performance later,how about make it stable first.

funilrys commented 5 years ago

No no typo @dnmTX:

$ printf "nameserver 1.1.1.1\nnameserver 1.0.0.1\n"
nameserver 1.1.1.1
nameserver 1.0.0.1

2.x.x is getting stable thanks to what we both do here so when we are done with all the issue 2.x.x will be the most stable version of PyFunceble ever :wink:

dnmTX commented 5 years ago

I'll keep monitoring and will report after next filtering is done.

funilrys commented 5 years ago

@dnmTX Following are fixed. Test result at https://github.com/Ultimate-Hosts-Blacklist/repository-structure/commit/16752ff83a86a5524a6977af58d0159bab337b49

Now let's focus on those still red one :smile_cat:

dnmTX commented 5 years ago

Ok,let me summarize:

  1. Missing *.doubleclick.net domains -FIXED (DNS resolver's fault)
  2. volatile.list whitelisted.list not updating after filtering -FIXED ~3. www. not applied in domains.list before initial filtering -if this was the case i'm still waiting on status for this one~
funilrys commented 5 years ago

Manually restarted lightswitch05_hosts_ads-and-tracking-extended

funilrys commented 5 years ago

@dnmTX www. and vice-versa are now tested and generated on the fly. No more pre-initialization saved into domains.list.

Which means (This is an example for one line but we do that for as most line as possible - and simultaneously):

dnmTX commented 5 years ago

Oh,i see.Well...that's a great improvement(i can scratch that from the list then).Just...again....to make sure...subdomains are skipped or....?

funilrys commented 5 years ago

Yes subdomains are skipped :)

dnmTX commented 5 years ago

Hehehe...just saw this in my router's log.It's from when i was checking the domains that were associated with Travis Cl's DNS resolver: May 11 12:25:23 DD-WRT daemon.warn dnsmasq[1315]: possible DNS-rebind attack detected: maisonmargielaparfums.com

funilrys commented 5 years ago

Another phishing domain 🤔

funilrys commented 5 years ago

Nope just some bad boyz :disappointed:

dnmTX commented 5 years ago

@funilrys how's it going with the repairs? I just checked lightswitch05's repo and it looks good as far as i can tell. Just checking on the progress. 😃

funilrys commented 5 years ago

Good @dnmTX :smile_cat: I restarted every red flagged this morning and they all passed successfully 🙂 There was some exception regarding the new DNS lookup logic of PyFunceble but it was quickly fixed. Unfortunally those exceptions were not flagged as the exceptions were not passed to the parent process :smile:

So right now I'm locally testing a way to completly stop PyFunceble when issue happen for the new --multiprocess argument. Once I found a proper solution I will implement that that our updater 🙂

Why twice the job ? Because 2.0.0 will have the same problem as our updated once distributed :joy_cat:

Indeed we (here) include the PyFunceble 2.0.0 API and do our business logic around its output when everybody else will use the --multiprocess argument from the CLI once released 🙂 So same problem to different scale and target :joy_cat:

funilrys commented 5 years ago

Oops my bad :joy_cat:

dnmTX commented 5 years ago

@funilrys couple builds failed so far,most likely is connected to the DNS resolving(as far as i can tell). Check it out when you have the time.Other then that everything else looks good 👍

funilrys commented 5 years ago

Should be fixed. Restarting all flagged one.

dnmTX commented 5 years ago

@funilrys there are still some builds that are failing: ZeroDot1 for example among few others.Can you please take a look.Thank you.

dnmTX commented 5 years ago

@funilrys builds are still failing after the most recent fixes and there are some that it says passing but has been filtering since yesterday(i'm guessing it shouldn't take that long).

funilrys commented 5 years ago

Hi @dnmTX I did not forgot you, just developing a way to get rid of all the databases in order to gain in time and space. (At least for near the future).

I'll restart everything in the next hours.

Cheers, Nissar

funilrys commented 5 years ago

Restart of all repositories started.

funilrys commented 5 years ago

Restart of all repositories finished.

funilrys commented 5 years ago

Another restart session will be run later as I forgot to first sync the repository-structure repository.

funilrys commented 5 years ago

For the record:

Due to changes in the PyFunceble API, I had to rewrite some section of the updater.

funilrys commented 5 years ago

Restart of all repositories started.

funilrys commented 5 years ago

Everything stopped because I might have reached GitHub API token limit.

funilrys commented 5 years ago

Restart of all repositories finished.

dnmTX commented 5 years ago

@funilrys the restart(from what i can see) didn't do much and as of late i've been busy and didn't and will not have the time(as much as before) to monitor,report and most importantly pushing you to do the fixes.It's been almost 3 weeks without any of my chosen lists from here got any updates and i'd really like whatever free time i'll have from now on to spend on my PC is to at least have some updated protection. I know you busy too and this is not me blaming you or whatever,i just need to know what is your plans to bring the filtering back to where it was(super stable) cause if it's not going to be soon i'd rather switch to the original lists and not to worry about it at least for a while.

funilrys commented 5 years ago

Hi @dnmTX, Sorry for my bad behavior in those last 3 weeks. I was just too focused on improving, testing, reviewing, fixing PyFunceble so that in the future it will be easier to maintain and understand by my peer. Fortunately (and thanks to @mitchellkrogza), we did a great job this week with @mitchellkrogza around that objective and no more major issue was found in the last 48 hours.

About the infrastructure, a patch which reviews the way we launch multiple tests at the time has been pushed. I'll monitor everything right now and until 01:30 (Berlin time). The objective for me is to get a list of traceback with the new patch so that I can take a big time of my Sunday to find solutions.

Everything was restarted about 10 minutes ago, it should be finished in the coming minutes.

Everything should come back and running the next week. If multiprocessing is not fixed, I'll drop it so that everybody can globally get new datasets.

Thanks for your understanding, your support and time.

Cheers, Nissar

funilrys commented 5 years ago

Just feedback before going to bed: No infrastructure issue from the last comment to now.

There are still 26/60+ repositories which are under test but since my patch, I did not get any failed nor error tests.

I took the time to search for traceback into the logs of all the one who finished, and I wasn't able to find some.

Will review the rest later in the morning but for me, the patch was the right one!

Cheers, Nissar

dnmTX commented 5 years ago

@funilrys great 👍 .Looking forward to turn my cron jobs back on. When i have time(during the week) will try to monitor it and will report back(if anything) in the upcoming weekend. Thank you for the hard work you doing,crossing fingers tomorrow's tests to go smoothly. Good night.

dnmTX commented 5 years ago

@funilrys how's it going with the fixes? Can you please give some update on the progress from your end. Just to note here that even though it says that the builds are passing none of the lists has been updated,it's like it's going into a loop all the time.

funilrys commented 5 years ago

Hi @dnmTX,

For me, everything works back! If there are no changes into clean.list or volatile.list it's because nothing changed. And no after checking, there is no loop!

I'm going to include the times for the future into info.json so you can work with it but on my side, everything works.

The only difference is in order to avoid an inaccurate test result overtime because we test as many as possible - and some networks might not like it - I reduced the execution time of one build from ~15 minutes to ~5 minutes. That way it also allows us to test more input sources in the same hour 😊

Cheers, Nissar

dnmTX commented 5 years ago

ok @funilrys i'll give you couple examples:

  1. lightswitch05 it says build error,the domains.list was last updated 8 days ago but as you can see from the original lists there was commit 4 days ago. Neither volatile.list,clean.list or whitelisted.list were updated for 25 days so far,which,i'd assume is when all stopped working.

  2. ZeroDot1 domains.list last updated 10 days ago,last commit 3 days ago.Neither of the other lists were updated for 28 days.

There are others.Sorry but,seeing this i can't convince myself that the filtering is anywhere near working properly. Also just the fact that there was commit in any of the original lists means that either domains were added or removed and therefore just for that reason,all of the lists here should've been updated.