Closed MaxRower closed 1 year ago
This is most likely due to the addition of the optimized transfer feature.
What you should do in order to achieve differential transfers is to use lxc snapshot
to take a snapshot, and then when transferring using --refresh
only the differences between the last snapshot and the current disk contents are transferred.
Does this help speed things up?
Well, I don't see what is an "optimized transfer feature", if it's transfers all data every time ;) Sadly, there is no real documentation, on how lxc copy does work, not even a manpage, just --help. I'd like to use local snapshots for local backups before upgrades or other ciritical changes inside a container only, and real longer term backups and snapshots on the backup server only. I don't understand, how a snapshot on the source would help? Should I copy that snapshot to the target and delete it afterwards? Since I am using lxc copy to backup to a dedicated backup server AND to replicate to another hot stand by server as well, at different times, I can't imaging how this should work. In the meantime, I downgraded one backup server to 4.0.9, just had to restore the 4.0.9 config from the last backup of it. Strangely, the lxc remote config was not included there.
Well, I don't see what is an "optimized transfer feature", if it's transfers all data every time ;)
For ZFS and BTRFS it uses the native (optimized) transfer mechanisms rather than rsync.
Without the --instance-only
option, if you had snapshots on the source, then when running lxc copy --refresh
it would copy those to the target (only if they were missing, and only using the differential from the previous snapshot) and then transfer the main volume (only as a differential from the latest snapshot).
But with the --instance-only
option I'm not sure what the expected behaviour should be here for a storage driver that supports optimized transfers.
@stgraber @monstermunchkin do you think that when doing an --instance-only
refresh between optimized storage pools that it should use rsync rather than transfer the whole volume everytime?
Sadly, there is no real documentation, on how lxc copy does work, not even a manpage, just --help.
Here's a bit about it in the release announcement:
and here:
@ru-fu we could probably do with updating https://linuxcontainers.org/lxd/docs/master/reference/storage_drivers/#storage-optimized-instance-transfer with a section describing how this works for instance refreshes.
and here:
Yes, I did read those already. But it only makes sense, if you want to have identical containers on all servers, including their snapshots? I wouldn't want to have all daily snapshots on my regular servers, only on the backup server. And no upgrade-related snapshots on the backup server. And no snapshots at all on the hot standby. Since an lxc copy --instance deletes all snapshots on the target, I do daily snapshotting on the backup server with btrfs snapshot to another directory not touched by lxd. It's only important that it stays deduplicated for as long as possible. Restoring it will involve just moving around those snapshots.
@monstermunchkin aside from the question about whether LXD should use rsync when using --instance-only
with --refresh
, I can see there also appear to be a bug with normal pool -> pool optimized refresh for BTRFS.
First lets see what ZFS does:
lxc storage create zfs1 zfs
lxc storage create zfs2 zfs
lxc launch images:ubuntu/jammy c1 -s zfs1
# Perform initial full copy.
time lxc copy c1 c2 --refresh -s zfs2
real 0m0.790s
# Would expect this to (currently) perform full copy again as there are no snapshots.
time lxc copy c1 c2 --refresh -s zfs2
real 0m0.890s
# Now lets add a snapshot and try again.
# We would expect this to take the same time as a full copy too as the missing snapshot needs to be transferred.
lxc snapshot c1
time lxc copy c1 c2 --refresh -s zfs2
real 0m0.895s
# Now lets run the refresh again.
# It should be quicker as it should transfer only the (minimal) differences between the snapshot and main volume.
time lxc copy c1 c2 --refresh -s zfs2
real 0m0.261s
We can see ZFS pool transfers are working correctly when used with snapshots.
Lets try the same now with BTRFS:
lxc storage create btrfs1 btrfs
lxc storage create btrfs2 btrfs
lxc launch images:ubuntu/jammy c1 -s btrfs1
# Perform initial full copy.
time lxc copy c1 c2 --refresh -s btrfs2
real 0m3.070s
# Would expect this to (currently) perform full copy again as there are no snapshots.
time lxc copy c1 c2 --refresh -s btrfs2
real 0m3.041s
# Now lets add a snapshot and try again.
# We would expect this to take the same time as a full copy too as the missing snapshot needs to be transferred.
lxc snapshot c1
time lxc copy c1 c2 --refresh -s btrfs2
real 0m3.273s
# Now lets run the refresh again.
# It should be quicker as it should transfer only the (minimal) differences between the snapshot and main volume.
time lxc copy c1 c2 --refresh -s btrfs2
real 0m3.124s
Oh dear, its the same as doing a full copy again. So optimized refresh appears broken for BTRFS.
Related to https://github.com/lxc/lxd/issues/10186
Hmm, yeah, I think we'd be better off using rsync for containers when using --instance-only and --refresh. For VMs, we should still use the optimized driver as in either case, we're dealing with a full transfer and optimized will be smaller/faster.
OK thanks, so there's 3 parts to this issue:
--refresh
and --instance-only
(or if there are no snapshots) for containers only.@ru-fu we could probably do with updating https://linuxcontainers.org/lxd/docs/master/reference/storage_drivers/#storage-optimized-instance-transfer with a section describing how this works for instance refreshes.
@tomp I think I need a bit more input here. ;)
From what I can gather, we're currently using the optimized image transfer (for the drivers that support it) both for the initial copy and a refresh. But the issue is that if we don't have snapshots (or don't want to transfer them), the refresh transfers everything and not only the diff. That sounds like a bug to me and nothing that needs to be documented?
If we now change it to use rsync if there are no snapshots, then we need a doc update that says that even if optimized image transfer is available, we won't use it if there are no snapshots to transfer (I guess because the optimized transfer is more efficient only if we're transferring big files). Is that correct?
But the issue is that if we don't have snapshots (or don't want to transfer them), the refresh transfers everything and not only the diff. That sounds like a bug to me and nothing that needs to be documented?
Yes indeed, that is a bug, and is the primary concern of this issue.
However I think it could be useful to tweak the docs that we have to explain in more detail what the optimized transfer means.
Particularly that it depends on having snapshots in the source and that they are transferred as part of the refresh (i.e not using --instance-only
mode).
As there have been a few examples in the forum of confusion around the --refresh
behaviour when going between pools that support optimized transfer. People didn't realise that optimized transfer depends on having snapshots.
If we now change it to use rsync if there are no snapshots, then we need a doc update that says that even if optimized image transfer is available, we won't use it if there are no snapshots to transfer (I guess because the optimized transfer is more efficient only if we're transferring big files). Is that correct?
Yes thats exactly right. The optimized transfer mechanism depends on sending only the differences between snapshots and the main volume. If there are no snapshots (or the user isn't sending them with --instance-only
) then we can't rely on the driver level differential approach. Instead for containers and filesystem volumes we will fallback to file-based differential using rsync, and for VMs and block volumes we will continue to transfer the full volume using raw block copies.
OK, I attempted to add something, but I'm still not sure I fully understand ...
https://github.com/lxc/lxd/pull/11323
Do the snapshots need to be part of the transfer? (But will we then really save much if a user creates a snapshot right before the transfer?) Or do they just need to exist on the source server? (But they would need to be copied to the target server as well or the diff won't make sense ...) Or do we just need one snapshot on the target server? (But how does the optimized transfer work for the first copy - not refresh - then?)
Answered on the PR
Required information
The output of "lxc info" or if that fails: config: core.https_address: '[::]:8443' core.trust_password: true images.auto_update_interval: "0" api_extensions:
storage_zfs_remove_snapshots
container_host_shutdown_timeout
container_stop_priority
container_syscall_filtering
auth_pki
container_last_used_at
etag
patch
usb_devices
https_allowed_credentials
image_compression_algorithm
directory_manipulation
container_cpu_time
storage_zfs_use_refquota
storage_lvm_mount_options
network
profile_usedby
container_push
container_exec_recording
certificate_update
container_exec_signal_handling
gpu_devices
container_image_properties
migration_progress
id_map
network_firewall_filtering
network_routes
storage
file_delete
file_append
network_dhcp_expiry
storage_lvm_vg_rename
storage_lvm_thinpool_rename
network_vlan
image_create_aliases
container_stateless_copy
container_only_migration
storage_zfs_clone_copy
unix_device_rename
storage_lvm_use_thinpool
storage_rsync_bwlimit
network_vxlan_interface
storage_btrfs_mount_options
entity_description
image_force_refresh
storage_lvm_lv_resizing
id_map_base
file_symlinks
container_push_target
network_vlan_physical
storage_images_delete
container_edit_metadata
container_snapshot_stateful_migration
storage_driver_ceph
storage_ceph_user_name
resource_limits
storage_volatile_initial_source
storage_ceph_force_osd_reuse
storage_block_filesystem_btrfs
resources
kernel_limits
storage_api_volume_rename
macaroon_authentication
network_sriov
console
restrict_devlxd
migration_pre_copy
infiniband
maas_network
devlxd_events
proxy
network_dhcp_gateway
file_get_symlink
network_leases
unix_device_hotplug
storage_api_local_volume_handling
operation_description
clustering
event_lifecycle
storage_api_remote_volume_handling
nvidia_runtime
container_mount_propagation
container_backup
devlxd_images
container_local_cross_pool_handling
proxy_unix
proxy_udp
clustering_join
proxy_tcp_udp_multi_port_handling
network_state
proxy_unix_dac_properties
container_protection_delete
unix_priv_drop
pprof_http
proxy_haproxy_protocol
network_hwaddr
proxy_nat
network_nat_order
container_full
candid_authentication
backup_compression
candid_config
nvidia_runtime_config
storage_api_volume_snapshots
storage_unmapped
projects
candid_config_key
network_vxlan_ttl
container_incremental_copy
usb_optional_vendorid
snapshot_scheduling
snapshot_schedule_aliases
container_copy_project
clustering_server_address
clustering_image_replication
container_protection_shift
snapshot_expiry
container_backup_override_pool
snapshot_expiry_creation
network_leases_location
resources_cpu_socket
resources_gpu
resources_numa
kernel_features
id_map_current
event_location
storage_api_remote_volume_snapshots
network_nat_address
container_nic_routes
rbac
cluster_internal_copy
seccomp_notify
lxc_features
container_nic_ipvlan
network_vlan_sriov
storage_cephfs
container_nic_ipfilter
resources_v2
container_exec_user_group_cwd
container_syscall_intercept
container_disk_shift
storage_shifted
resources_infiniband
daemon_storage
instances
image_types
resources_disk_sata
clustering_roles
images_expiry
resources_network_firmware
backup_compression_algorithm
ceph_data_pool_name
container_syscall_intercept_mount
compression_squashfs
container_raw_mount
container_nic_routed
container_syscall_intercept_mount_fuse
container_disk_ceph
virtual-machines
image_profiles
clustering_architecture
resources_disk_id
storage_lvm_stripes
vm_boot_priority
unix_hotplug_devices
api_filtering
instance_nic_network
clustering_sizing
firewall_driver
projects_limits
container_syscall_intercept_hugetlbfs
limits_hugepages
container_nic_routed_gateway
projects_restrictions
custom_volume_snapshot_expiry
volume_snapshot_scheduling
trust_ca_certificates
snapshot_disk_usage
clustering_edit_roles
container_nic_routed_host_address
container_nic_ipvlan_gateway
resources_usb_pci
resources_cpu_threads_numa
resources_cpu_core_die
api_os
container_nic_routed_host_table
container_nic_ipvlan_host_table
container_nic_ipvlan_mode
resources_system
images_push_relay
network_dns_search
container_nic_routed_limits
instance_nic_bridged_vlan
network_state_bond_bridge
usedby_consistency
custom_block_volumes
clustering_failure_domains
resources_gpu_mdev
console_vga_type
projects_limits_disk
network_type_macvlan
network_type_sriov
container_syscall_intercept_bpf_devices
network_type_ovn
projects_networks
projects_networks_restricted_uplinks
custom_volume_backup
backup_override_name
storage_rsync_compression
network_type_physical
network_ovn_external_subnets
network_ovn_nat
network_ovn_external_routes_remove
tpm_device_type
storage_zfs_clone_copy_rebase
gpu_mdev
resources_pci_iommu
resources_network_usb
resources_disk_address
network_physical_ovn_ingress_mode
network_ovn_dhcp
network_physical_routes_anycast
projects_limits_instances
network_state_vlan
instance_nic_bridged_port_isolation
instance_bulk_state_change
network_gvrp
instance_pool_move
gpu_sriov
pci_device_type
storage_volume_state
network_acl
migration_stateful
disk_state_quota
storage_ceph_features
projects_compression
projects_images_remote_cache_expiry
certificate_project
network_ovn_acl
projects_images_auto_update
projects_restricted_cluster_target
images_default_architecture
network_ovn_acl_defaults
gpu_mig
project_usage
network_bridge_acl
warnings
projects_restricted_backups_and_snapshots
clustering_join_token
clustering_description
server_trusted_proxy
clustering_update_cert
storage_api_project
server_instance_driver_operational
server_supported_storage_drivers
event_lifecycle_requestor_address
resources_gpu_usb
clustering_evacuation
network_ovn_nat_address
network_bgp
network_forward
custom_volume_refresh
network_counters_errors_dropped
metrics
image_source_project
clustering_config
network_peer
linux_sysctl
network_dns
ovn_nic_acceleration
certificate_self_renewal
instance_project_move
storage_volume_project_move
cloud_init
network_dns_nat
database_leader
instance_all_projects
clustering_groups
ceph_rbd_du
instance_get_full
qemu_metrics
gpu_mig_uuid
event_project
clustering_evacuation_live
instance_allow_inconsistent_copy
network_state_ovn
storage_volume_api_filtering
image_restrictions
storage_zfs_export
network_dns_records
storage_zfs_reserve_space
network_acl_log
storage_zfs_blocksize
metrics_cpu_seconds
instance_snapshot_never
certificate_token
instance_nic_routed_neighbor_probe
event_hub
agent_nic_config
projects_restricted_intercept
metrics_authentication
images_target_project
cluster_migration_inconsistent_copy
cluster_ovn_chassis
container_syscall_intercept_sched_setscheduler
storage_lvm_thinpool_metadata_size
storage_volume_state_total
instance_file_head
instances_nic_host_name
image_copy_profile
container_syscall_intercept_sysinfo
clustering_evacuation_mode
resources_pci_vpd
qemu_raw_conf
storage_cephfs_fscache
network_load_balancer
vsock_api
instance_ready_state
network_bgp_holdtime
storage_volumes_all_projects
metrics_memory_oom_total
storage_buckets
storage_buckets_create_credentials
metrics_cpu_effective_total
projects_networks_restricted_access
storage_buckets_local
loki
acme
internal_metrics
cluster_join_token_expiry
remote_token_expiry
init_preseed
storage_volumes_created_at
cpu_hotplug
projects_networks_zones
network_txqueuelen api_status: stable api_version: "1.0" auth: trusted public: false auth_methods:
tls environment: addresses:
****:8443 architectures:
x86_64
i686 certificate: | -----BEGIN CERTIFICATE-----
-----END CERTIFICATE----- certificate_fingerprint: **** driver: qemu | lxc driver_version: 7.1.0 | 5.0.2 firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" shiftfs: "false" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.15.0-58-generic lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "22.04" project: default server: lxd server_clustered: false server_event_mode: full-mesh server_name: backup server_pid: 2606 server_version: "5.10" storage: btrfs storage_version: 5.4.1 storage_supported_drivers:
name: btrfs version: 5.4.1 remote: false
name: ceph version: 15.2.17 remote: true
name: cephfs version: 15.2.17 remote: true
name: cephobject version: 15.2.17 remote: true
name: dir version: "1" remote: false
name: lvm version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0 remote: false
name: zfs version: 2.1.4-0ubuntu0.1 remote: false
Storage backend in use: btrfs
Issue description
After upgrading most of my servers from lxd 4.0.9 to 5.10, lxc copy --refresh does not transmit only the changes since the last copy, but always transmits all data. Both servers are using btrfs for storage. Data on the target is overwritten completely every time, so any existing deduped reflinks are duplicated again. On lxd 4.0.9 the copy was done via rsync, now it's replaced with btrfs send & receive. For me, this renders lxc copy unusable, since I use it to backup and replicate containers via VPN, and that would need to transfer ~400GB/day, and duplicate any (manually created) btrfs snapshots of my backup history. Transfers between an old lxd 4.0.9 and the new 5.10 still use rsync. Is there a way to get back to old behavior of lxd 4.0.9?
Steps to reproduce
LAN copy ~8.5GB: root@backup:~# time lxc copy remote:container container --refresh --instance-only -c boot.autostart=false -q -s lxd
real 5m6.173s user 0m0.126s sys 0m0.462s
subsequent repetitions only differ slightly.