Open nikki-quant opened 1 month ago
@nikki-quant , have you checked connection from the container itself?
Thanks for your response. Do you mean the connection to the stats.xinfin.network?
If so, yes, I get a response from netcat and curl:
Confirming that the stats server corresponds to the IP address in the logs:
ubuntu@node:~$ host stats.xinfin.network
stats.xinfin.network has address 45.82.64.150
Netcat uses TCP unless we pass the flag -u
, I don't see any issue with TCP access to the node on that port:
ubuntu@node:~$ nc stats.xinfin.network -zv 3000
Connection to stats.xinfin.network 3000 port [tcp/*] succeeded!
Hitting that endpoint with curl, we receive a HTML document:
ubuntu@node:~$ curl http://stats.xinfin.network:3000
<!DOCTYPE html><html ng-app="netStatsApp"><head><meta name="viewport" content="width=device-width, initial-scale=1.0,
The node's securitygroup has outbound access open on all ports but some restrictions on inbound connections. AWS securitygroups will allow a response to an outgoing connection, and since TCP keeps the connection open rather than the server initiating a new connection, I would not expect this to be an an issue with networks ecurity.
If you are running it in docker, please enter to the container env.
docker ps -a
mainnet-xinfinnetwork-1
) and enter to container:
docker exec -ti mainnet-xinfinnetwork-1 bash
curl 45.82.64.150:3000
What I see:
Yes, I'm running in docker. I receive the same response as on the underlying host.
ubuntu@node:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0cbe0caa505d xinfinorg/xdposchain:v2.2.4 "bash /work/entry.sh" 3 hours ago Up 3 hours 8555/tcp, 0.0.0.0:30303->30303/tcp, :::30303->30303/tcp, 0.0.0.0:8989->8545/tcp, [::]:8989->8545/tcp, 0.0.0.0:8888->8546/tcp, [::]:8888->8546/tcp mainnet-xinfinnetwork-1
ubuntu@node:~$ docker exec -it 0cbe0caa505d bash
0cbe0caa505d:/work# curl 45.82.64.150:3000 | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0<!DOCTYPE html><html ng-app="netStatsApp">
I'm using a recent copy of this repo:
commit 769d463c6dd272925b0ffad15db0c2341d8dfe81 (HEAD -> master, origin/master, origin/HEAD)
Author: Anil Chinchawale <anil24593@gmail.com>
Date: Tue Sep 17 20:03:54 2024 +0530
Update README.md
I have updated the docker-compose file to use the image xinfinorg/xdposchain:v2.2.4
but did not pull an updated copy of the master branch because I'm waiting for other people in my team to review the 2.2.5 changes and confirm that they'll work with our application stack.
Have you enabled firewall?
Looks like you need to allow connection between 172.18.0.2 and 45.82.64.150
172.18.0.2 - is the docker subnet on your server and it needs to have an access to 45.82.64.150:3000
@pro100skm as the log shows, when I exec onto the Docker container, I can connect to 45.82.64.150:3000
(in the output above I receive the HTML document starting <!DOCTYPE html><html ng-app="netStatsApp">
from that server).
Could you explain why you think there is a firewall issue currently? From what I can see it is successfully accessing that endpoint.
Yeah, now I see it. Let's wait for dev's response from core team
Hi, this seems very slow to me too, could you try to restart and see if you have same issue?
the same after a while
the same
Restarting the process did not help and I could not see any evidence the node was resource constrained with top, lsof, vmstat or similar tools, but our node eventually began syncing somewhat quicker (1k blocks/minute) and caught up without us making changes.
I'm still not sure if this was to do with a lack of peers on the newer version or some other factor. No changes to underlying infrastructure I made while debugging seemed to have an effect. This gives me some concern about continuing to maintain a node since it's not clear to me how to effectively debug and resolve issues with XDC software stack.
Dear XinFin team,
We run a XDC Mainnet node in house, and around 3 weeks ago we provisioned a new v2.2.4 node on an AWS EC2 r5a.xlarge instance. We had used the same instance type previously and found the performance good enough for our purposes, syncing 1.5 million blocks in 15 minutes.
On this occasion our new node is syncing very slowly - an average of 300 blocks in a minute. It has ~17 peers, and does not seem CPU, disk or memory constrained according to
top
,vmstat
or Cloudwatch metrics.In terms of application logs, the only errors I see are related to the stats endpoint, which seems to be incorrect:
I can reach the stats server with netcat or curl, I'm unsure why the application is getting TCP errors:
Does the node name passed as $INSTANCE_NAME need to be unique?
Are there any configuration options we can change in the node settings, or recommendations that you would make for system configuration to increase performance?
Is there any documentation on troubleshooting this kind of issue out there we should take a look at?