Closed reliveyy closed 3 years ago
Lnds blocks while syncing with status "Syncing 0.00% (0/665740)". Here are some log lines about this:
Jan 12 15:10:07.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $AFF2FC5C6F793B6E147EB93C1897D6DDA49E54FD~Wix at 95.211.230.211. Retrying on a new circuit.
2021-01-12 15:10:08.759 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:08.759 [WRN] BTCN: got error attempting to determine correct cfheader checkpoints: got mismatched checkpoints, trying again
2021-01-12 15:10:12.714 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:12.714 [WRN] BTCN: Detected mismatch at index=229 for checkpoints!!!
2021-01-12 15:10:14.212 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:14.212 [WRN] BTCN: got error attempting to determine correct cfheader checkpoints: got mismatched checkpoints, trying again
2021-01-12 15:10:17.236 [INF] BTCN: Lost peer 195.201.95.119:8333 (outbound)
Jan 12 15:10:25.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $6F4E9FD00D4251D98BE96FB1AA546FE34676A95B~CalyxInstitute06 at 162.247.74.206. Retrying on a new circuit.
Jan 12 15:10:26.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $CF4872108C9F6EB9E485B79AB35D1881F9698732~libreexit06 at 209.141.33.53. Retrying on a new circuit.
2021-01-12 15:10:27.244 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:27.245 [WRN] BTCN: Detected mismatch at index=229 for checkpoints!!!
2021-01-12 15:10:28.857 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:28.857 [WRN] BTCN: got error attempting to determine correct cfheader checkpoints: got mismatched checkpoints, trying again
Jan 12 15:10:32.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $A53C46F5B157DD83366D45A8E99A244934A14C46~csailmitexit at 128.31.0.13. Retrying on a new circuit.
2021-01-12 15:10:33.016 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:33.016 [WRN] BTCN: Detected mismatch at index=229 for checkpoints!!!
2021-01-12 15:10:34.425 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:34.425 [WRN] BTCN: got error attempting to determine correct cfheader checkpoints: got mismatched checkpoints, trying again
2021-01-12 15:10:38.405 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:38.405 [WRN] BTCN: Detected mismatch at index=229 for checkpoints!!!
2021-01-12 15:10:39.531 [WRN] BTCN: mismatch at height 230000, expected 63cdbfbded0a1e310192676d2c482767ca014fc89c09d830637faa746bd969d8 got 1308d5cfc6462f877a5587fd77d7c1ab029d45e58d5175aaf8c264cee9bde760
2021-01-12 15:10:39.531 [WRN] BTCN: got error attempting to determine correct cfheader checkpoints: got mismatched checkpoints, trying again
Jan 12 15:10:41.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $0FF233C8D78A17B8DB7C8257D2E05CD5AA7C6B88~politkovskaja at 77.247.181.165. Retrying on a new circuit.
I believe this is due to the broken Tor connection and the lnd persists some wrong cfheaders so that restarting doesn't help. But good news is the Docker engine on Windows doesn't crash for a long time.
Current status (not including points from first message):
Actual result: Proxy crash
2021-01-13 12:46:38.769 [DEBUG] gin : [172.19.0.1] GET /static/js/2.5ad162db.chunk.js | 200 | 8ms
2021/01/13 12:46:39 http: TLS handshake error from 172.19.0.1:40628: remote error: tls: unknown certificate
2021-01-13 12:46:39.520 [DEBUG] : [SocketIO/2] CONNECT: RemoteAddr=172.19.0.1:40632
2021-01-13 12:46:40.035 [DEBUG] service.xud : Failed to get container status: container not found
2021-01-13 12:46:40.035 [DEBUG] gin : [172.19.0.1] GET /api/v1/status/xud | 200 | 1ms
2021-01-13 12:46:40.041 [DEBUG] ServiceManager : [Status] proxy: Ready
panic: interface conversion: interface is nil, not lnrpc.LightningClient
goroutine 349 [running]:
github.com/ExchangeUnion/xud-docker-api/service/lnd.(*RpcClient).getClient(...)
/src/service/lnd/rpc.go:55
github.com/ExchangeUnion/xud-docker-api/service/lnd.(*RpcClient).GetInfo(0xc0005535e0, 0xf91040, 0xc000406ae0, 0xc000170401, 0x11, 0x52658b)
/src/service/lnd/rpc.go:63 +0x45
github.com/ExchangeUnion/xud-docker-api/service/lnd.(*Service).GetStatus(0xc000400c00, 0xf91040, 0xc000406ae0, 0xf91040, 0xc000406ae0)
/src/service/lnd/lnd.go:149 +0xf8
github.com/ExchangeUnion/xud-docker-api/service.(*Manager).GetStatus.func1(0xc0003983c0, 0xf9bca0, 0xc000400c00, 0xc0003e6b40)
/src/service/manager.go:173 +0x119
created by github.com/ExchangeUnion/xud-docker-api/service.(*Manager).GetStatus
/src/service/manager.go:169 +0xd4
@raladev
LND syncing stucking at 0.00% ... LND syncing stucking at any percentage (not limited to Windows only. I reproduced on Linux too)
I think these two blocking cases are due to broken Tor connection. And LND do not have posibilities to recover from it. And this blocking issue seems not related to the binary launcher. So my draft idea about this is that we should leave it out of this PR and focusing on this BUG in another PR (we could try to separate Tor as a service)
proxy crash after setiing up current env.
Yes. I know the proxy service is fragile while other services recreated or restarted. I will try to fix this today.
incorrect status of connext container (socket hang up). xud contains connext connection errors, but connext container is fine. Maybe we need update _1exchangeunion/xud:1.2.4__launcher image and xud-docker PR.
(broken connext and boltz status) WIP
I think these two blocking cases are due to broken Tor connection
first one is definitely tor issue, i noticed it with old utils flow, but IMO second one is something that connected with xud-launcher because i did not saw that before.
first one is definitely tor issue, i noticed it with old utils flow, but IMO second one is something that connected with xud-launcher because i did not saw that before.
@raladev Yes. It's new to us and suspicious. I'll keep an eye on this.
The connext non-ready status is becuase xud connects to connext port 8000 instead of 5040.
The boltz "btc down; ltc down" is becuase Docker API ContainerExecAttach
returns error "unable to upgrade to tcp, received 200". However, docker exec mainnet_boltz_1 wrapper btc getinfo
works.
There is another issue when I run boltz on Linux. The mapped .boltz
data directory has an empty macaroons
folder. But inside the container there are two macaroon files.
-rw------- 1 root root 110 Jan 15 09:38 admin.macaroon
-rw------- 1 root root 96 Jan 15 09:38 readonly.macaroon
I'm wondering why these two files cannot be mapped to host filesystem.
The connext non-ready status is becuase xud connects to connext port 8000 instead of 5040.
Somehow mainnet arm64 images seem to be vector then (amd64 works fine):
Connext info:
βββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β Status β connect ECONNREFUSED 10.0.3.3:8000 β
βββββββββββΌβββββββββββββββββββββββββββββββββββββ€
Anyhow, can you take care of this? @erkarl
The mapped .boltz data directory has an empty macaroons folder.
It's because the file is only visible to root. You need to use "sudo" on host system to see these files. And if you are a normal user you cannot share these files between two containers. That's a new problem!
And if you are a normal user you cannot share these files between two containers
We can use docker-compose named volumes to resolve this issue.
services:
boltz:
volumes:
- boltz-data:/root/.boltz
proxy:
volumes:
- boltz-data:/root/network/data/boltz
volumes:
boltz-data:
driver: local
driver_opts:
type: none
device: ./data/boltz
o: bind
I think it's a right decision to migrate from bind mounts to volumes. Here are the reasons from Docker official docs:
- Volumes are easier to back up or migrate than bind mounts.
- You can manage volumes using Docker CLI commands or the Docker API.
- Volumes work on both Linux and Windows containers.
- Volumes can be more safely shared among multiple containers.
- Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
- New volumes can have their content pre-populated by a container.
- Volumes on Docker Desktop have much higher performance than bind mounts from Mac and Windows hosts.
The only problem now of using "local" driver volume is that it got two copies of data, one in /var/lib/docker
and another in your custom location. That's not acceptable for blockchain data. But there is a Docker volume driver plugin called "local-persist" may fit our requirements.
We cannot use "local-persist" plugin right now because it requires an extra daemon running on the host. So the realistic solution for this will be fixing boltz data files permission after they created.
Lndbtc died quickly because of "unable to initialize neutrino backend: unable to create neutrino light client: tor host is unreachable"
lndbtc_1 | 2021-01-25 06:23:51,701 INFO exited: lnd (exit status 1; not expected)
lndbtc_1 | 2021-01-25 06:23:52,705 INFO spawned: 'lnd' with pid 2383
lndbtc_1 | [DEBUG] Enabling neutrino
lndbtc_1 | Waiting for lnd-bitcoin onion address...
lndbtc_1 | Onion address for lnd-bitcoin is iywbic3wi2woxqows7xsbym7l5dke7wm5qwixyvgt2pqwhjli2yjruad.onion
lndbtc_1 | 2021-01-25 06:23:52.803 [INF] LTND: Version: 0.11.1-beta commit=v0.11.1-beta, build=production, logging=default
lndbtc_1 | 2021-01-25 06:23:52.803 [INF] LTND: Active chain: Bitcoin (network=mainnet)
lndbtc_1 | 2021-01-25 06:23:52.804 [INF] LTND: Opening the main database, this might take a few minutes...
lndbtc_1 | 2021-01-25 06:23:52.806 [INF] LTND: Opening bbolt database, sync_freelist=false
lndbtc_1 | 2021-01-25 06:23:52.814 [INF] CHDB: Checking for schema update: latest_version=17, db_version=17
lndbtc_1 | 2021-01-25 06:23:52.817 [INF] LTND: Database now open (time_to_open=10.7891ms)!
lndbtc_1 | 2021-01-25 06:23:53.743 [ERR] LTND: unable to initialize neutrino backend: unable to create neutrino light client: tor host is unreachable
lndbtc_1 | 2021-01-25 06:23:53,744 INFO success: lnd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
lndbtc_1 | 2021-01-25 06:23:53.745 [INF] LTND: Shutdown complete
lndbtc_1 | unable to initialize neutrino backend: unable to create neutrino light client: tor host is unreachable
lndbtc_1 | 2021-01-25 06:23:53,747 CRIT uncaptured python exception, closing channel <POutputDispatcher at 140384854133872 for <Subprocess at 140384854336464 with name lnd in state RUNNING> (stderr)> (<class 'OSError'>:[Errno 29] Invalid seek [/usr/lib/python3.8/site-packages/supervisor/supervisord.py|runforever|220] [/usr/lib/python3.8/site-packages/supervisor/dispatchers.py|handle_read_event|270] [/usr/lib/python3.8/site-packages/supervisor/dispatchers.py|record_output|204] [/usr/lib/python3.8/site-packages/supervisor/dispatchers.py|_log|173] [/usr/lib/python3.8/site-packages/supervisor/loggers.py|info|327] [/usr/lib/python3.8/site-packages/supervisor/loggers.py|log|345] [/usr/lib/python3.8/site-packages/supervisor/loggers.py|emit|227] [/usr/lib/python3.8/site-packages/supervisor/loggers.py|doRollover|264])
If tor doesnt start, its usually a permission issue
tor host is unreachable
It's not a Tor issue. I found one of our Neutrino peer becomes invalid and it fails lnd startup (although it shouldn't be).
FYI, we are still getting boltz status "btc down; ltc down" because of Golang Docker SDK error "unable to upgrade to tcp, received 200". But the boltz wapper getinfo actually works.
unable to upgrade to tcp, received 200
This boltz status issue has been fixed.
Another compatibility issue: cannot bring up proxy with an exiting master mainnet.
This PR removes utils container usage in setup.sh and replace it with binary launcher from xud-launcher project.
Related to https://github.com/ExchangeUnion/xud-launcher/pull/9 Closes https://github.com/ExchangeUnion/xud-docker/issues/825
How to test?
Run
launcher
branch on major platforms will bring up the most familiar setup flow into your terminal.On Linux and macOS:
On Windows:
Build and test locally:
N.B. We need to use new proxy image becuase of new "attach mode" and endpoints (
/api/v1/info
,/api/v1/backup
,/api/v1/xud/changepass
) introduced. We need a new 1.2.4 image for xud becuase of the backup location fix (removing /mnt/hostfs)Todos
/api/v1/info
/api/v1/setup-status
Implement xud.ps1Bugs
docker info
got "Bad response from Docker engine"