Closed bigjohns97 closed 3 years ago
Restarting just using the wpa_supplicant caused the WAN to go offline not much later on, running the whole pfatt.sh script produced a stable connection.
It is very strange that you are now experiencing issues with 2.5. You were the one who tested it in #22 a while back and it was working fine. And we also had someone more recently test it in #37 and two people there reported that it was fine.
This could be the new PfSense and corresponding issue with the new version of unbound.
Think it more the new version of PfBlockerNG 3.0 that takes advantage of unbound 1.12 python integration, unbound on 2.5 has not changed for a month or 2 now. Definitely some issues with the new PfBlockerNG 3.0 DNSBL Unbound Python Integration and unbound handling that gracefully. A few people are reporting isolated web outages and unbound going dark.
Think it more the new version of PfBlockerNG 3.0 that takes advantage of unbound 1.12 python integration, unbound on 2.5 has not changed for a month or 2 now. Definitely some issues with the new PfBlockerNG 3.0 DNSBL Unbound Python Integration and unbound handling that gracefully. A few people are reporting isolated web outages and unbound going dark.
Yeah I have been following that thread and isn't what I see, I have checked and /var/unbound is correctly owned.
I will do some more research and will report back with any findings.
Did some more testing and this does seem to be an issue with the script just on boot.
On boot everything seems to come online fine, ngeth gets the previous ip address and even unbound starts up (or at least says it does) but it doesn't function properly as I can't resolve DNS names.
If I shell in and run top I can see wpa_supp taking up a full core, re-running the script solves this and DNS starts working again.
Going through the logs I am not seeing what is failing in the boot process, this has to be something in the script.
I did notice that there was a lot of stuff added to config.xml so I pulled the earlycmd entry and put it back where it should be as the last thing before system closes. Issue still persists.
I tried adding another entry to the config.xml using shellcmd which is supposed to run a little later in the boot process (I didn't remove the original one) and that keeps wpa_supp from eating CPU but unbound still will not resolve dns names.
What I don't understand is why when I run it manually does it work?
Not sure if there is any other option I have to schedule this to run on boot any ideas?
Okay I was able to get it to work without any manual commands by adding the pfatt.sh script into the /usr/local/etc/rc.d/ folder.
I left the original in it's place along with the config.xml entry just in case it was needed for boot for some reason.
Okay I was able to get it to work without any manual commands by adding the pfatt.sh script into the /usr/local/etc/rc.d/ folder.
I left the original in it's place along with the config.xml entry just in case it was needed for boot for some reason.
Scratch this, the script kept running and resetting the WAN IP which in turn restarted all of the services, I don't know what else to try.
It makes no sense that script just keeps running and resetting the IP. There are no loops in the script and once it's done running once it should just stop. What do you see in logs regarding the script? In supplicant mode the logger should be outputting to syslogs and it adds some info to the log almost every step of the way.
I am seeing this in the logs as well
Dec 2 16:44:40 | kernel | | ---<
It does this a couple of more times until I reboot it.
Maybe I need to do something else besides just dropping that script into the rc.d folder?
That is very strange. According to documentation it does not mention anything about running the script multiple times. That folder is a default from FreeBSD and I have not heard of scripts continuously restarting before.
Also are there any log entries from pfatt script itself? Current supplicant script has a bunch of logging embedded that should be going to syslog.
If you look at documentation above you can also try moving the execution of pfatt.sh from earlyshellcmd to shellcmd. Earlyshellcmd runs the commands at the beginning of the boot process. Shellcmd will runs commands towards the end of the boot process.
That is very strange. According to documentation it does not mention anything about running the script multiple times. That folder is a default from FreeBSD and I have not heard of scripts continuously restarting before.
Also are there any log entries from pfatt script itself? Current supplicant script has a bunch of logging embedded that should be going to syslog.
If you look at documentation above you can also try moving the execution of pfatt.sh from earlyshellcmd to shellcmd. Earlyshellcmd runs the commands at the beginning of the boot process. Shellcmd will runs commands towards the end of the boot process.
I already tried the shellcmd option, mentioned above.
Also yes it still shows all of the entries I just didn't post them here to keep it simple and easy to read.
Final update for now. I've been tshooting this on and off for days.
If you kill wpa_supplicant process, pfSense will not re-authenticate when it needs to (<24h), and the Internet will simply die as a result. The workaround for this is to 'pgrep -f wpa_supplicant' then 'kill ' and re-run the script. This will re-authenticate you properly without the CPU usage bug. Annoying, but at least its working.
But there's another problem I've experienced on my Protectli Vault 6P. Whenever I start pulling down a file "quickly", say <20MB/s, the WAN (ngeth0) flakes out and drops connection after some seconds/minutes. After a few seconds of wetting itself, it will re-establish a connection. It does this until you stop downloading. I ruled out my cabling and devices. I reintroduce the AT&T RG (IP Passthru) into the mix and reset my pfSense settings to "normal", and everything is fine. No issues.
There's not much information about the supplicant bypass and 2.5.DEVEL, but there are some major issues. I suspect once 2.5 is final and people start trying to upgrade, they will notice the massive breakages. At least at this point, this script is pretty much DOA. I think it's a netgraph issue with FreeBSD 12.x. Anyhow, I hope this information helps somebody.
@Aerowinder try just rerunning the script manually once the system fully boots, for me this resolves the issue and it is a bit tedious to have to do this manually every time the system reboots but since my pfsense is very stable this doesn't happen often.
Yes, I was doing that for a few days, and it worked. I pretty much give up for now, now that I found the other issue of ngeth0 dropping my WAN connection.
I had to give up on pfatt for the time being, as something in the overall process broke in the upgrade from 2.4.5 to 2.5.0.
I had to give up on pfatt for the time being, as something in the overall process broke in the upgrade from 2.4.5 to 2.5.0.
I upgraded pfsense from 2.4.5 to 2.5.0 last night and lost internet this morning. Not sure why everything was working last night and stopped this morning.
After some debugging, I noticed when pfsense boots, I get an error message about ng_etf.ko.
KLD ng_etf.ko: depends on kernel - not available or version mismatch linker_load_file: /boot/kernel/ng_etf.ko - unsupported file type kldload: an error occurred while loading module ng_etf. Please check dmesg(8) for more details Updating configuration...done. Warning: Configuration references interfaces that do not exist: ngeth0
Network interface mismatch -- Running interface assignment option.
I'm guessing that the original ng_etf.ko that came in the repository was compiled for freebsd 11 and not freebsd 12. I could be wrong though.
@justinhamlett that is correct, in fact since 2.4.5 the extra kerbal module step hasn't been needed as it was in included from there on out. You should be able to get a 12 version and overwrite to fix your issue.
There is another issue however with 2.5 that requires you to run the pratt script manually after reboot and possibly restart some other services such as ntopng.
@justinhamlett that is correct, in fact since 2.4.5 the extra kerbal module step hasn't been needed as it was in included from there on out. You should be able to get a 12 version and overwrite to fix your issue.
There is another issue however with 2.5 that requires you to run the pratt script manually after reboot and possibly restart some other services such as ntopng.
So I'm guessing I need to manually build/compile a new ng_etf.ko on a freebsd 12 system and then just replace the previous version at /boot/kernel/ng_etf.ko?
Or downgrade back to pfsense to 2.4.5 haha
Everything is back up and running now after building ng_etf.ko on a freebsd 12 box and then replacing the file on the pfsense 2.5 box.
Everything is back up and running now after building ng_etf.ko on a freebsd 12 box and then replacing the file on the pfsense 2.5 box.
Any chance you can help a newb with instructions on how to do this?
@ColonelCobra14
I downloaded a virtualbox image of freebsd (https://www.osboxes.org/freebsd/) and followed these instructions:
** from a FreeBSD machine (not pfSense!) fetch ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/12.1-RELEASE/src.txz tar -C / -zxvf src.txz cd /usr/src/sys/modules/netgraph make scp etf/ng_etf.ko root@pfsense:/boot/kernel/ ssh root@pfsense chmod 555 /boot/kernel/ng_etf.ko
I got these instructions from Archerious pfatt repo - https://github.com/Archerious/pfatt/tree/master
I updated the fetch URL in the code above to use 12.1-RELEASE instead of 11.2-RELEASE. Instead of using scp
to transfer the file over to pfsense, I used a flash drive to copy ng_etf.ko
to my pfsense box.
If that is too much trouble for you, I can share my freebsd 12.1 build of ng_etf.ko
@ColonelCobra14
I downloaded a virtualbox image of freebsd (https://www.osboxes.org/freebsd/) and followed these instructions:
** from a FreeBSD machine (not pfSense!) fetch ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/12.1-RELEASE/src.txz tar -C / -zxvf src.txz cd /usr/src/sys/modules/netgraph make scp etf/ng_etf.ko root@pfsense:/boot/kernel/ ssh root@pfsense chmod 555 /boot/kernel/ng_etf.ko
I got these instructions from Archerious pfatt repo - https://github.com/Archerious/pfatt/tree/master
I updated the fetch URL in the code above to use 12.1-RELEASE instead of 11.2-RELEASE. Instead of using
scp
to transfer the file over to pfsense, I used a flash drive to copyng_etf.ko
to my pfsense box.If that is too much trouble for you, I can share my freebsd 12.1 build of ng_etf.ko
I tried this and it didn't seem to fix my issue. I think ng_etf.ko may have already been in the base 2.5 build.
Same issue with the wpa_supplicant method. Upgraded to 2.5 and CPU pegs to 100%. I've switched toe Bridge mode for now but I hope we can figure this one out!
@grevelle
I just checked dmesg and found this error:
module_register: cannot register ng_etf from ng_etf.ko; already loaded from kernel Module ng_etf failed to register: 17
I'm guessing during the upgrade the old ng_etf.ko remained instead of being upgraded to newest version. Not sure though, just wanted to share the error message. I'm going to debug more some tomorrow.
Also, I'm not using the wpa_supplicant method.
I tried following the instructions on compiling and copying the new version of ng_etf.ko from a fresh install of FreeBSD 12.2. It did not resolve the issue, still getting the high CPU load. Maybe I did it wrong? Curious to see if anyone who has done this successfully can confirm that it works.
From: Justin Hamlett notifications@github.com Sent: Thursday, February 18, 2021 1:30:33 AM To: MonkWho/pfatt pfatt@noreply.github.com Cc: Greg Revelle greg@revelle.me; Mention mention@noreply.github.com Subject: Re: [MonkWho/pfatt] PfSense 2.5 issue (#41)
@grevellehttps://github.com/grevelle
I just checked dmesg and found this error:
module_register: cannot register ng_etf from ng_etf.ko; already loaded from kernel Module ng_etf failed to register: 17
I'm guessing during the upgrade the old ng_etf.ko remained instead of being upgraded to newest version. Not sure though, just wanted to share the error message. I'm going to debug more some tomorrow.
Also, I'm not using the wpa_supplicant method.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/MonkWho/pfatt/issues/41#issuecomment-781120039, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHRNGQOJCN6727HSUN7L33LS7S6ZTANCNFSM4UJILCWQ.
I tried following the instructions on compiling and copying the new version of ng_etf.ko from a fresh install of FreeBSD 12.2. It did not resolve the issue, still getting the high CPU load. Maybe I did it wrong? Curious to see if anyone who has done this successfully can confirm that it works. … ____ From: Justin Hamlett notifications@github.com Sent: Thursday, February 18, 2021 1:30:33 AM To: MonkWho/pfatt pfatt@noreply.github.com Cc: Greg Revelle greg@revelle.me; Mention mention@noreply.github.com Subject: Re: [MonkWho/pfatt] PfSense 2.5 issue (#41) @grevellehttps://github.com/grevelle I just checked dmesg and found this error: module_register: cannot register ng_etf from ng_etf.ko; already loaded from kernel Module ng_etf failed to register: 17 I'm guessing during the upgrade the old ng_etf.ko remained instead of being upgraded to newest version. Not sure though, just wanted to share the error message. I'm going to debug more some tomorrow. Also, I'm not using the wpa_supplicant method. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#41 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHRNGQOJCN6727HSUN7L33LS7S6ZTANCNFSM4UJILCWQ.
This isn't a fix to resolve CPU load on boot, it was just a resolution on Justin's problem of using an old kernel module after upgrade.
I've been using 21.1 for 20 hours so far and have no issues. I'm using my opnatt.sh script.
I've been using 21.1 for 20 hours so far and have no issues. I'm using my opnatt.sh script.
Just tried your script and it failed to load the netgraph module.
Actually wondering if the loading of these modules is needed, I don't see this commands being run on the pfatt version of the script.
I can't start testing now but will in a couple of hours.
I was able to use the opnatt script by removing the calls to the netgraph modules, however the wpa_supplicant cpu usage remained, and re-running the script did nothing to get rid of the CPU usage.
I really think this is a bug with the latest version of pfsense at this moment and I am thinking of opening a bug with their tracker.
I was able to use the opnatt script by removing the calls to the netgraph modules, however the wpa_supplicant cpu usage remained, and re-running the script did nothing to get rid of the CPU usage.
I really think this is a bug with the latest version of pfsense at this moment and I am thinking of opening a bug with their tracker.
It's either that or something with FreeBSD 12.2... my guess is pfSense as well...
See reddit link here for ResidentEffect4816 who submitted a bug report.
I've been able to work around this by putting the script in /usr/local/etc/rc.syshook.d/monitor
(OPNsense, in addition to early
) so it's triggered when connection is lost. This way there is only a brief period of downtime.
Still an issue after 2.5.1 upgrade, updating redmine tracker
Just to tie this altogether:
Following an excellently written guide on building pfSense 2.5.0 here (https://github.com/Augustin-FL/building-pfsense-iso-from-source), I was able to build pfSense 2.5.1 from source as libreSense. My forks are here:
I'm still on pfSense 2.4.5 so I'm not able to test the resulting wpa_supplicant binary. I've posted it as a release here if someone else wants to try it out. In theory, you just need to replace the existing wpa_supplicant in /usr/sbin, match permissions and ownership, and reboot.
https://github.com/romracer/FreeBSD-src/releases/tag/patched_wpa_supplicant
Of course, just like the previously used ng_etf.ko module, you should not technically trust my binary build and you should build it on your own. But that process is time consuming, so for testing purposes only, I've published my binary. If anyone tries this out, I would appreciate feedback. If it helps you out, it may be worth commenting on the Redmine issue as well so this fix can be applied upstream to pfSense.
Just to tie this altogether:
- The Redmine issue is here: https://redmine.pfsense.org/issues/11453
- Recently, Milo Medin has posted a potential fix to that issue. Specifically, he references this bug regarding wpa_supplicant: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252844
- That bug in wpa_supplicant was patched with the following commit (not present in pfSense 2.5.0 or 2.5.1): https://cgit.freebsd.org/src/commit/?id=d70886d063166786ded0007af8cdcbf57b7b4827
Following an excellently written guide on building pfSense 2.5.0 here (https://github.com/Augustin-FL/building-pfsense-iso-from-source), I was able to build pfSense 2.5.1 from source as libreSense. My forks are here:
- https://github.com/romracer/pfsense
- https://github.com/romracer/FreeBSD-ports
- https://github.com/romracer/FreeBSD-src (this repo has the wpa_supplicant fix)
I'm still on pfSense 2.4.5 so I'm not able to test the resulting wpa_supplicant binary. I've posted it as a release here if someone else wants to try it out. In theory, you just need to replace the existing wpa_supplicant in /usr/sbin, match permissions and ownership, and reboot.
https://github.com/romracer/FreeBSD-src/releases/tag/patched_wpa_supplicant
Of course, just like the previously used ng_etf.ko module, you should not technically trust my binary build and you should build it on your own. But that process is time consuming, so for testing purposes only, I've published my binary. If anyone tries this out, I would appreciate feedback. If it helps you out, it may be worth commenting on the Redmine issue as well so this fix can be applied upstream to pfSense.
THANK YOU!!! This works perfectly. I'll keep an eye on it to see if something changes over time. But right out of the gate it looks like it's working well and CPU usage is back to normal.
Just to tie this altogether:
- The Redmine issue is here: https://redmine.pfsense.org/issues/11453
- Recently, Milo Medin has posted a potential fix to that issue. Specifically, he references this bug regarding wpa_supplicant: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252844
- That bug in wpa_supplicant was patched with the following commit (not present in pfSense 2.5.0 or 2.5.1): https://cgit.freebsd.org/src/commit/?id=d70886d063166786ded0007af8cdcbf57b7b4827
Following an excellently written guide on building pfSense 2.5.0 here (https://github.com/Augustin-FL/building-pfsense-iso-from-source), I was able to build pfSense 2.5.1 from source as libreSense. My forks are here:
- https://github.com/romracer/pfsense
- https://github.com/romracer/FreeBSD-ports
- https://github.com/romracer/FreeBSD-src (this repo has the wpa_supplicant fix)
I'm still on pfSense 2.4.5 so I'm not able to test the resulting wpa_supplicant binary. I've posted it as a release here if someone else wants to try it out. In theory, you just need to replace the existing wpa_supplicant in /usr/sbin, match permissions and ownership, and reboot.
https://github.com/romracer/FreeBSD-src/releases/tag/patched_wpa_supplicant
Of course, just like the previously used ng_etf.ko module, you should not technically trust my binary build and you should build it on your own. But that process is time consuming, so for testing purposes only, I've published my binary. If anyone tries this out, I would appreciate feedback. If it helps you out, it may be worth commenting on the Redmine issue as well so this fix can be applied upstream to pfSense.
I tried it too... and it works for me as well!!!
Wow what a great find. Would you mind updating the https://redmine.pfsense.org/issues/11453 telling them how to update the release to fix this? I couldn't figure out which freebsd release this patch was linked to (if any).
Thank you for figuring this one out. Awesome work!
Just to tie this altogether:
- The Redmine issue is here: https://redmine.pfsense.org/issues/11453
- Recently, Milo Medin has posted a potential fix to that issue. Specifically, he references this bug regarding wpa_supplicant: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252844
- That bug in wpa_supplicant was patched with the following commit (not present in pfSense 2.5.0 or 2.5.1): https://cgit.freebsd.org/src/commit/?id=d70886d063166786ded0007af8cdcbf57b7b4827
Following an excellently written guide on building pfSense 2.5.0 here (https://github.com/Augustin-FL/building-pfsense-iso-from-source), I was able to build pfSense 2.5.1 from source as libreSense. My forks are here:
- https://github.com/romracer/pfsense
- https://github.com/romracer/FreeBSD-ports
- https://github.com/romracer/FreeBSD-src (this repo has the wpa_supplicant fix)
I'm still on pfSense 2.4.5 so I'm not able to test the resulting wpa_supplicant binary. I've posted it as a release here if someone else wants to try it out. In theory, you just need to replace the existing wpa_supplicant in /usr/sbin, match permissions and ownership, and reboot.
https://github.com/romracer/FreeBSD-src/releases/tag/patched_wpa_supplicant
Of course, just like the previously used ng_etf.ko module, you should not technically trust my binary build and you should build it on your own. But that process is time consuming, so for testing purposes only, I've published my binary. If anyone tries this out, I would appreciate feedback. If it helps you out, it may be worth commenting on the Redmine issue as well so this fix can be applied upstream to pfSense.
Nice work! This resolved my issues of CPU usage on boot, BTW is anyone else running pfblockerNG?
I am noticing that after this fix I still have DNS resolution issues until unbound is started and was wondering if anyone else is having similar issues?
Did some more troubleshooting on DNS resolution on first boot.
It seems like there isn't any issues with unbound, level 2 logging doesn't show any errors and any queries made from the console don't show any issues.
However any clients on the LAN cannot resolve domain names.
I wonder if this has to do with my LAN being a 3 port LAGG.
Is anyone else seeing DNS issues on boot and restarting unbound resolves the issue?
You probably don't need to build the entire pfsense iso. freebsd ports have already included that patch into wpa_supplicant port(https://www.freshports.org/security/wpa_supplicant).
summary:
Edit: Added missing steps.
@git-nerd I am just an infosec nerd, so my dabbling in inner workings of compilation is admittedly light.
I had some trouble completing your instructions.
I started with a freebsd 12.0 from here: [https://www.osboxes.org/freebsd/#freebsd-12-1-info], then executed a freebsd-update install -r 12.2-RELEASE.
I received 12.2-RELEASE-p6 (src component not installed, skipped)
After that I continued ok with steps 2-6 - no problems.
At step 7, I found the 'work' directory did not exist. I executed a find for libssl.so.11 from / but nothing came up.
Can you possibly help me figure out what went wrong (and how to fix it)?
@git-nerd maybe I can re-phrase as - which freebsd 12.2 image should I start with in my VM?
@git-nerd maybe I can re-phrase as - which freebsd 12.2 image should I start with in my VM?
I have a truenas machine which is a freebsd system that allows to have “jail”s which is a concept very similar to containers. That jail is based on 12.2p6.
I think the machine that you setup is correct. What you can do is a grep when you run the build command:
cd /usr/ports/security/openssl/ && make install clean | grep libssl
That should print the location on the console.
Edit: Forgot to add: Nerds will prevail!!
@git-nerd - got it working by using the correct commands - I had missed a step. Apologies for that.
Now which wpa_supplicant is replaced on pfsense? /usr/sbin, or /usr/local/sbin?
Nerd prevalence will be completely independent of all variables!
--Edit-- from ps ax, I discovered /usr/sbin/wpa_supplicant is the culprit. Replacing that. Thanks!
@git-nerd - no luck with updated binary from /usr/local/sbin/wpa_supplicant on my build system.
After a swap of wpa_supplicant and a reboot of pfsense, wpa_supplicant was not running, and I had no IP. I had to swap back to my old wpa_supplicant and reboot again. I am running again at 100% CPU on the default wpa_supplicant.
Any thoughts how to troubleshoot?
You can take look at the creation date on the file to see which is the right file.
Another option would be to do a search the filesystem under /usr/ports/security/wpa_supplicant
You can take look at the creation date on the file to see which is the right file.
This is what I did , and determined /usr/local/sbin/wpa_supplicant had a creation date of today. This is the one I used.
Pfsense logs show a dhcp of the correct IP address, so it seems like it worked, but for some reason the IP was not on the interface after reboot.
Weird.
--Edit-- found the problem but don't know how to solve it. My updated binary is a 64-bit version, the original is 32-bit. I am going to start over beginning with i386-12.2-RELEASE. I think that would solve this.
--Edit-- found the problem but don't know how to solve it. My updated binary is a 64-bit version, the original is 32-bit. I am going to start over beginning with i386-12.2-RELEASE. I think that would solve this.
are you sure? I dont think there is 32 bit version of pfsense anymore (https://www.netgate.com/blog/pfsense-2-4-0-release-now-available.html)
You might have a different issue altogether. If the output of “wpa_cli status” shows authorized and you are not using 100% cpu then wpa_supplicant is doing its job.
are you sure? I dont think there is 32 bit version of pfsense anymore
getconf LONG_BIT shows '32' so I guess I am on 32-bit on my Netgate SG-3100. I am using a vanilla build stock load. This looks like my root issue but wondering if there is a way I can get to 64-bit if possible.
Oh you are using netgate appliance, ok that might be it then.
Recently tried out PfSense 2.5 to test out the new version of unbound which fixes some issues with PfBlockerNG and null blocking and the python module which provides client ip addresses of DNS requests when using null blocking and noticed that wpa_supplicant is using all of one core upon boot.
Killing the PID results in loss of WAN but starting it back up using just the wpa_supplicant command from the script seems to restore WAN connectivity and not eat up the CPU.
Not sure how to troubleshoot but this might be something to look at as the 2.5 branch seems to be getting close to release.