Closed QuintLeo closed 3 months ago
2024-05-07T12:34:05.855 full_node chia.server.start_service: ERROR fatal uncaught exception: OSError: [Errno 98] error while attempting to bind on address ('0.0.0.0', 8444): address already in use
When looking at the quoted line above it suggests to me the "full node" had trouble claiming port 8444 from the OS to use for itself, as if the system reserved it for something else, or something else is currently running on it?
Are you able to look and see if you have other processes running on the machine? Something that wasn't stopped previously? Something that starts up automatically? Do any other processes have control of port 8444? Perhaps signed on as another user who launched something on that port and now you're on another user account trying the same? Since I am naive to linux does the user account being used have permission to utilize that port?
As far as I understand the Chia services, the "farmer" can have its errors and issues without impacting the "full node" service from running, but in this case the "full node" service wasn't able to get going because of the OS port issue? I would expect the "farmer" service to fail if it was unable to resolve the domain name, not the "full node" service. (The "farmer" uses port 8447 and the "full node" uses 8444.)
The only thing that has ever used port 8444 on that machine is the Chia full node. There are NO other users on the machine. There aren't any other accounts to sign in WITH. Nothing changed in the configuration of the machine itself to cause any issues when I started seeing this issue, and 2.1.4 WAS working before I started seeing this issue for many months to a year.
Additional info I just figured out. The CLI WILL start the full node. It's specifically the GUI that is hanging.
Try rebooting before running anything to remove the possibility of zombies.
You can go into config.yaml and see if there are any references to flexpool. It may be best to delete config.yaml and have chia recreate it by resyncing your wallet
I've done the reboot then run immediately before I run anything else a few times now. No change. There are no references to flexpool in my config.yaml - I forgot to mention that in the original post but I checked that a couple weeks before I posted this bug report. It does not seem to be the wallet that's hanging, or I couldn't get into that via the CLI. There doesn't seem to be any way to start the full node independently via the CLI, though there is an "all" option in the CLI that seems to work.
2 weeks and hasn't even been assigned? Sad.
This is the interesting part
The CLI WILL start the full node. It's specifically the GUI that is hanging.
We did change how the GUI launches services recently. Izumi can u check on this?
Can we get a new log file after the reboot? That should clear the 8444 error.
https://docs.chia.net/installation/?_highlight=install#cli Also you can try running without the GUI
chia start node
don't use "all" as that will start a timelord
I started the client for the first time in perhaps 2 weeks today - it started up the GUI for the first time in nearly a month. I have NO IDEA why it wasn't opening for a while, as I've not changed anything.
2024-06-12T13:31:23.701 farmer chia.farmer.farmer : ERROR Exception in GET /pool_info https://xch-us.flexpool.io, Cannot connect to host xch-us.flexpool.io:443 ssl:<ssl.SSLContext object at 0x7fd8df9734c0> [Name or service not known] 2024-06-12T13:31:23.792 full_node chia.full_node.mempool_manager: WARNING updating the mempool using the slow-path. peak: c651a85ab23412fcbb7436d897a4a53d81beac0eea591c041228c81ab986dde0 new-peak-prev: 8cc7cd484a120d87a25ffd93ec32f0237d6ea69106c7be40b34ab9877cd3149e coins: not set 2024-06-12T13:33:26.201 farmer chia.farmer.farmer : ERROR Exception in GET /pool_info https://xch-us.flexpool.io, Cannot connect to host xch-us.flexpool.io:443 ssl:<ssl.SSLContext object at 0x7fd8dfadfc40> [Name or service not known]
I investigated source code and here is the summary.
GET /pool_info ...
farmer chia.farmer.farmer : ERROR Exception in GET /pool_info ...
doesn't shutdown fullNode nor stop executing code. (Actually this should be labeled as WARNING
)
A farmer just retries the GET
request periodically.
So this is not relevant to the issue where fullNode is not launching.
config.yaml
If you're sure that flexpool is not in your config.yaml
, then it is loaded from another config.yaml
you haven't checked yet. I suspect unexpected value was set to $CHIA_ROOT
env var or sometimes the env var is loaded and sometimes not.
Maybe your .bashrc
or .profile
or something like those scripts failed before setting CHIA_ROOT
and sometimes succeeded.
updating the mempool using the slow-path
indicates fullNode was already runningThis is only logged when new_peak
(the latest block) is sent from other fullNodes and local fullNodes is accepting it. This means a fullNode was already running and received peak block from external fullNodes.
So the subsequent error (OSError: [Errno 98] error while attempting to bind on address ('::', 8444, 0, 0): address already in use
) indicates that something attepted to launch another fullNode even if it was already running.
2.3.0
, even if a fullNode is already running, the GUI just skip to launch another fullNode.So I'm suspecting that you tried to lauch the GUI whose version is less than 2.3.0
when a fullNode was already running.
I need more info to proceed my investigation.
chia version
CLI command return and what the GUI's version dialog shows?Closing issue - feel free to add a comment with additional details if needed
What happened?
2-3 weeks ago, my GUI started sticking at the "starting services full node" point - the other 3 services seem to start up normally. I never used database v1 as far as I know of, my database path is already pointing at a v2 database - this was a suggested fix, but as I'm already ON a v2 database it's not my issue. The error log seems to be pointing at an issue with not finding flexpool.io even though there is NOTHING in my config.yaml that mentions flexpool at all.
The DNS records for flexpool seem to have expired at the time this issue started, and at the time I had been running the GUI just long enough every morning to update it's sync, which was normally happening in a very few minutes and the full node was normally starting up within well under 5 minutes to do the sync. I do NOT use flexpool, though I think I might have used it for a short while many months to a couple years ago.
I have NOT changed anything in my configuration for quite a long time - months to a couple years. I have no plots on this machine at all.
Version
2.1.4, 2.2.1, 2.3.0
What platform are you using?
Linux
What ui mode are you using?
GUI
Relevant log output