Closed mpontillo closed 1 year ago
Yes, this is the track I started to go down with https://github.com/lxc/lxd/pull/10216#issuecomment-1093320228
I like where you were going with this, @tomponline. A few thoughts on the commit you referenced:
libvirt
code I referenced never gives you up - which is arguably a bug, so hopefully it doesn't let you down.libvirt
, which starts counting from 3.)Hello, I am writing to express my interest in working on the issue mentioned in the bug report #11508 for LXD. As a student at the university, I am eager to contribute to open source projects and gain valuable experience in software development.
I have experience in Linux systems and I am familiar with virtualization technologies. I believe that my skills and knowledge would be useful in resolving the issue mentioned in the bug report. I am willing to work with the LXD development team to find a solution to the problem and contribute to the project.
Thank you for considering my interest in this issue. I look forward to hearing back from you.
Thanks @Anis-cpu-13 assigned to you!
I think I have exactly same issue by peer containers causing vsock ID conflict when I try to launch a QEMU VM within containers.
What's current stage on the fix?
Is there a workaround/hack to fix it for now?
@tomponline
@stgraber , this issue is one which stops our company to move to LXD, is there any workaround before the fix? Thought 5.14 would fix this one :/
@kochia7 what is the use case for running VMs inside containers? (it maybe there is a workaround in the short term until this is fixed).
@kochia7 what is the use case for running VMs inside containers? (it maybe there is a workaround in the short term until this is fixed).
We have Android Machines (Qemu/CrossVM) running per container, unfortunately we are not able to run VMs directly on the machine, so planning to use LXC containers as lightweight isolation for each Qemu VM.
Thanks. What is the reason for "we are not able to run VMs directly on the machine"?
And are you aware that by passing through non-namespaced devices like /dev/kvm, you are potentially exposing you host to attacks from the containers, so just want to check you are aware that doing that reduces the isolation. I wondered what sort of isolation you are expecting from running VMs inside containers?
VMM we using create artifacts which interfere each other with other machines, it's how the VMM is build, unable to work with several VM in parallel. So LXC creates just enough isolation to make them work
You can set volatile.vsock_id
on the instance before starting it, so if you're able to set them to non-conflicting IDs, then you can workaround the problem for now.
lxc config set <VM instance> volatile.vsock_id=n
I'm not sure though how it will work with LXD's own vsock listener though (as opposed to the lxd-agent's). It maybe that vsock just won't work properly when being run inside containers.
You can set
volatile.vsock_id
on the instance before starting it, so if you're able to set them to non-conflicting IDs, then you can workaround the problem for now.lxc config set <VM instance> volatile.vsock_id=n
I will give it a try, seems promising! Thanks!
I used lxc config set <VM instance> volatile.vsock_id=n
for LXC containers from the host
and
guest-cid
/-vsock_guest_cid
for Qemu within the containers for nested VMs and it worked!
Only the last one would have done anything as setting volatile.vsock_id on a container doesn't do anything.
You can set
volatile.vsock_id
on the instance before starting it, so if you're able to set them to non-conflicting IDs, then you can workaround the problem for now.lxc config set <VM instance> volatile.vsock_id=n
Didn't really work for LXD: https://github.com/lxc/lxd/issues/11739#issuecomment-1567389272
Required information
Ubuntu
22.04 "Jammy"
5.19.0-35-generic
5.12
5.12
zfs
``` $ lxc info config: {} api_extensions: - storage_zfs_remove_snapshots - container_host_shutdown_timeout - container_stop_priority - container_syscall_filtering - auth_pki - container_last_used_at - etag - patch - usb_devices - https_allowed_credentials - image_compression_algorithm - directory_manipulation - container_cpu_time - storage_zfs_use_refquota - storage_lvm_mount_options - network - profile_usedby - container_push - container_exec_recording - certificate_update - container_exec_signal_handling - gpu_devices - container_image_properties - migration_progress - id_map - network_firewall_filtering - network_routes - storage - file_delete - file_append - network_dhcp_expiry - storage_lvm_vg_rename - storage_lvm_thinpool_rename - network_vlan - image_create_aliases - container_stateless_copy - container_only_migration - storage_zfs_clone_copy - unix_device_rename - storage_lvm_use_thinpool - storage_rsync_bwlimit - network_vxlan_interface - storage_btrfs_mount_options - entity_description - image_force_refresh - storage_lvm_lv_resizing - id_map_base - file_symlinks - container_push_target - network_vlan_physical - storage_images_delete - container_edit_metadata - container_snapshot_stateful_migration - storage_driver_ceph - storage_ceph_user_name - resource_limits - storage_volatile_initial_source - storage_ceph_force_osd_reuse - storage_block_filesystem_btrfs - resources - kernel_limits - storage_api_volume_rename - macaroon_authentication - network_sriov - console - restrict_devlxd - migration_pre_copy - infiniband - maas_network - devlxd_events - proxy - network_dhcp_gateway - file_get_symlink - network_leases - unix_device_hotplug - storage_api_local_volume_handling - operation_description - clustering - event_lifecycle - storage_api_remote_volume_handling - nvidia_runtime - container_mount_propagation - container_backup - devlxd_images - container_local_cross_pool_handling - proxy_unix - proxy_udp - clustering_join - proxy_tcp_udp_multi_port_handling - network_state - proxy_unix_dac_properties - container_protection_delete - unix_priv_drop - pprof_http - proxy_haproxy_protocol - network_hwaddr - proxy_nat - network_nat_order - container_full - candid_authentication - backup_compression - candid_config - nvidia_runtime_config - storage_api_volume_snapshots - storage_unmapped - projects - candid_config_key - network_vxlan_ttl - container_incremental_copy - usb_optional_vendorid - snapshot_scheduling - snapshot_schedule_aliases - container_copy_project - clustering_server_address - clustering_image_replication - container_protection_shift - snapshot_expiry - container_backup_override_pool - snapshot_expiry_creation - network_leases_location - resources_cpu_socket - resources_gpu - resources_numa - kernel_features - id_map_current - event_location - storage_api_remote_volume_snapshots - network_nat_address - container_nic_routes - rbac - cluster_internal_copy - seccomp_notify - lxc_features - container_nic_ipvlan - network_vlan_sriov - storage_cephfs - container_nic_ipfilter - resources_v2 - container_exec_user_group_cwd - container_syscall_intercept - container_disk_shift - storage_shifted - resources_infiniband - daemon_storage - instances - image_types - resources_disk_sata - clustering_roles - images_expiry - resources_network_firmware - backup_compression_algorithm - ceph_data_pool_name - container_syscall_intercept_mount - compression_squashfs - container_raw_mount - container_nic_routed - container_syscall_intercept_mount_fuse - container_disk_ceph - virtual-machines - image_profiles - clustering_architecture - resources_disk_id - storage_lvm_stripes - vm_boot_priority - unix_hotplug_devices - api_filtering - instance_nic_network - clustering_sizing - firewall_driver - projects_limits - container_syscall_intercept_hugetlbfs - limits_hugepages - container_nic_routed_gateway - projects_restrictions - custom_volume_snapshot_expiry - volume_snapshot_scheduling - trust_ca_certificates - snapshot_disk_usage - clustering_edit_roles - container_nic_routed_host_address - container_nic_ipvlan_gateway - resources_usb_pci - resources_cpu_threads_numa - resources_cpu_core_die - api_os - container_nic_routed_host_table - container_nic_ipvlan_host_table - container_nic_ipvlan_mode - resources_system - images_push_relay - network_dns_search - container_nic_routed_limits - instance_nic_bridged_vlan - network_state_bond_bridge - usedby_consistency - custom_block_volumes - clustering_failure_domains - resources_gpu_mdev - console_vga_type - projects_limits_disk - network_type_macvlan - network_type_sriov - container_syscall_intercept_bpf_devices - network_type_ovn - projects_networks - projects_networks_restricted_uplinks - custom_volume_backup - backup_override_name - storage_rsync_compression - network_type_physical - network_ovn_external_subnets - network_ovn_nat - network_ovn_external_routes_remove - tpm_device_type - storage_zfs_clone_copy_rebase - gpu_mdev - resources_pci_iommu - resources_network_usb - resources_disk_address - network_physical_ovn_ingress_mode - network_ovn_dhcp - network_physical_routes_anycast - projects_limits_instances - network_state_vlan - instance_nic_bridged_port_isolation - instance_bulk_state_change - network_gvrp - instance_pool_move - gpu_sriov - pci_device_type - storage_volume_state - network_acl - migration_stateful - disk_state_quota - storage_ceph_features - projects_compression - projects_images_remote_cache_expiry - certificate_project - network_ovn_acl - projects_images_auto_update - projects_restricted_cluster_target - images_default_architecture - network_ovn_acl_defaults - gpu_mig - project_usage - network_bridge_acl - warnings - projects_restricted_backups_and_snapshots - clustering_join_token - clustering_description - server_trusted_proxy - clustering_update_cert - storage_api_project - server_instance_driver_operational - server_supported_storage_drivers - event_lifecycle_requestor_address - resources_gpu_usb - clustering_evacuation - network_ovn_nat_address - network_bgp - network_forward - custom_volume_refresh - network_counters_errors_dropped - metrics - image_source_project - clustering_config - network_peer - linux_sysctl - network_dns - ovn_nic_acceleration - certificate_self_renewal - instance_project_move - storage_volume_project_move - cloud_init - network_dns_nat - database_leader - instance_all_projects - clustering_groups - ceph_rbd_du - instance_get_full - qemu_metrics - gpu_mig_uuid - event_project - clustering_evacuation_live - instance_allow_inconsistent_copy - network_state_ovn - storage_volume_api_filtering - image_restrictions - storage_zfs_export - network_dns_records - storage_zfs_reserve_space - network_acl_log - storage_zfs_blocksize - metrics_cpu_seconds - instance_snapshot_never - certificate_token - instance_nic_routed_neighbor_probe - event_hub - agent_nic_config - projects_restricted_intercept - metrics_authentication - images_target_project - cluster_migration_inconsistent_copy - cluster_ovn_chassis - container_syscall_intercept_sched_setscheduler - storage_lvm_thinpool_metadata_size - storage_volume_state_total - instance_file_head - instances_nic_host_name - image_copy_profile - container_syscall_intercept_sysinfo - clustering_evacuation_mode - resources_pci_vpd - qemu_raw_conf - storage_cephfs_fscache - network_load_balancer - vsock_api - instance_ready_state - network_bgp_holdtime - storage_volumes_all_projects - metrics_memory_oom_total - storage_buckets - storage_buckets_create_credentials - metrics_cpu_effective_total - projects_networks_restricted_access - storage_buckets_local - loki - acme - internal_metrics - cluster_join_token_expiry - remote_token_expiry - init_preseed - storage_volumes_created_at - cpu_hotplug - projects_networks_zones - network_txqueuelen - cluster_member_state - instances_placement_scriptlet - storage_pool_source_wipe - zfs_block_mode - instance_generation_id - disk_io_cache api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls environment: addresses: [] architectures: - x86_64 - i686 certificate: | -----BEGIN CERTIFICATE-----lxc info
outputIssue description
When attempting to use
lxd
to launch virtual machines in multiple peer containers, errors such asvhost-vsock: unable to set guest cid: Address already in use
can be observed when launching VMs, causing the VMs to fail to start.Steps to reproduce
Create a profile allowing nested virtualization, such as:
Launch two or more containers with this profile, such as:
Use
lxc shell
to enter both containers and run:Expected results
Both virtual machines should be created and started.
Actual results
The second VM to be created will fail to start.
lxc info --show-log local:bionic
will display:Additional Information
There was an attempt to address this in PR #10216. However, this fix seems to assume that the vsock IDs are only shared with the parent, not peer containers.
libvirt
seems to avoid this problem by iterating over usable IDs until a free ID is found.