Closed captonsnake closed 1 year ago
Can you run lxc monitor --pretty --type=logging
on both source and target servers while doing another run of the transfer?
This should give a lot more information on what's going on.
Also please try on the up coming LXD 5.13 release (you're running 5.10 which isn't supported) as there have been a lot of changes to the migration subsystem in that release. Thanks
I'll close this for now, but if you are still experiencing the same issue on 5.13 and later please let us know and we will reopen. Thanks
Required information
Server A (Prod)
Server B (Backup)
Issue description
Recently moved to using LXD to run a few internal services. We created a container on Server A, and configured snapshots daily for 15 days. Server B gets an 'lxc copy serverA:services services --mode="relay" --refresh` nightly via cron to copy to the backup server in the event server A goes down. Theres a little more that happens in the cron script, but manually entering the above command reproduces the error every time.
This worked fine for about a week. Now, we get this error when attempting to run the
lxc copy
:The copy will begin and proceed for approximately an hour then fail with the above error message. We have not been able to find any other log message (journalctl, dmesg, etc) that has any related information to why the connection was dropped. Also no significant information when using
--debug --verbose
.We captured full tcpdump traffic on both servers, and see
tcp window full
andtcp zero windows
after transfer hangs but no data transfer after about an hour. Then the connection drops.We are definately not out of storage, and definately not out of memory. CPU Utilization is high on Server A, but we attrib that to rsync compression.
We have ran several performance tests for network connection and saw nothing significant using iperf. So not likely network, but there is always the possibility. It is difficult for us to get access to networking after equipment is installed. Also used scp, and netcat to transfer several large test files with no issues.
We can copy the services instance with no snapshots using
--instance-only
, and that completes fine.Steps to reproduce
lxc copy "servera:services" "services" --mode=relay --refresh
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue) serverA_services_info.txt serverA_services_config.txt