canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

I don't assign GPUs to the container, but all GPUs are displayed in the container #8277

Closed ma3252788 closed 3 years ago

ma3252788 commented 3 years ago

Required information

sipl@sipl-4Xp:~$ lxc info
config:
  images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- candid_authentication
- candid_config
- candid_config_key
- usb_optional_vendorid
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIFajCCA1KgAwIBAgIQOq+mN5s5ojifgPe/CpFALDANBgkqhkiG9w0BAQsFADA2
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRYwFAYDVQQDDA1yb290QHNp
    cGwtNFhwMB4XDTE4MDQyNDE1MTQ1OFoXDTI4MDQyMTE1MTQ1OFowNjEcMBoGA1UE
    ChMTbGludXhjb250YWluZXJzLm9yZzEWMBQGA1UEAwwNcm9vdEBzaXBsLTRYcDCC
    AiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAODIy12JEIgSwi6XvttM4AXT
    A+ZK9x7fgI6zC57Yj5TEaCIpSl8n+ZwvL6m4m2Mk82YLHWgQ0tJuYsythN5bXj5c
    MG/Clp9Cj0+Hfo2zRvYu0EON1YUiBSj+4lEpH5oh9nQA2z/b54eca+dQb10xbVV1
    7Igp+4vfdAyPsEkyJtX5Btdy22CpWCQbF5m3t3uMHT2eKXxiu94UjOxb6SBPSELm
    n1H0ozT6Qr1OMN1LpjMBhCpYKSklKVfb2+sV5S3YBQ6WJCuATJMwwtdSYTzaPmSa
    eKECmjIA0SDzMEXo2LYHmqX9n/xiBLWKljub1upBf8CuKSC/NRJJ47U1B4BsYC26
    3RGcLDoJoKH8BkBx/Vo72UlEM8IyfRs/uo6Q2Hrrkmg1x4wvBT26Nz3KsbTI4aA9
    Yf6fsb6wkJ76UPnpeX/TxhNCaXaeoNWuU2333uiPmyUIhpPT89kfhblNiq/+zO9u
    x+KxlY9NEVAi4gzTzvRz2Rct5dUio8wFr3KTKtyEdQNP1hzG1cpU3bY0p3bL0NtH
    Wz5NUz8+J7vfneSMvC0yRtAC1GsiLK7d7JtFOiOEbc/pdoQEry6OJ9tAkCMHpSZV
    hzNVWUiSsxLGSS+Q0oFhoOWr55y/RWs+onr1mZwN1xBTr6cpcbrEDGx+2+NwVCN0
    kutZKH8kTz27LWhbPUnBAgMBAAGjdDByMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUE
    DDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMD0GA1UdEQQ2MDSCCHNpcGwtNFhw
    hwSsGRGqhxAgAQ2oAhYlEcgOmG3bMMM4hxAgAQ2oAhYlEZJmtWN3DyLCMA0GCSqG
    SIb3DQEBCwUAA4ICAQDRMn1cpIWn8arDjsSiFp16+zaBunmJsnwYlhqcS3JUn+Lp
    wZbok4p9YgqKasUW34DSkEm9cHg6KBNWom0/bclTWJUbvejD8qTHWbti6IsCxfZO
    ziCEvIcz+KuyeRmsm57IINvUJXNmcmft75AEPOtiXjiReHoJUVIPQ5fZ7ml4q1mX
    jJZBxAfl9jfhRvFYetaoSKOjL6pZ2+u7p1pOW0TCO/+tohExgM+PwUBek9iYqtuE
    Xq2TmudFXbFUNZmOJN6xsw6yDNFwYA++QwwtGtiCP3Ml1saHkqMf8VNPGMgAGkvS
    JQqMHDkg8VoY3kopdshbeI7dv/H9FpU5xNmLIT7eun6urBK9BpiIFPdaioGRegjH
    WfI4eU1AUzSpTTGc9FYW6TFlpgQFv5/0aQNiJbrC0zvYKD9jnqDfVlF8NG8Qnyl0
    zA9cC1QkYrfaRJmTRDWpayIWwpGPtgS0pz9RA3KR0+mVnEZGgQhixToKLPMfO29m
    /9SRzica4ARELOUuFpocVSBtQZunej8ZmQBOoe+ZwWLoGacXfQVyBMAUbeCT+cxW
    Zxd9qXts/QZiGiVJtCBFzielMjQe8Yuhj1DMn4T/GhV4vxk4+BnzyE2enFpoqhvI
    AaDEDFf+NhnfVMXOMz2ab73i0wnyDcfTKMBJDjEh9GitWKNCwIpBcMJvzA1C7A==
    -----END CERTIFICATE-----
  certificate_fingerprint: 20fe4f60860b908ac17b2fc3560c13ef61199af8c3bb0b7cbb2db13b02a4db46
  driver: lxc
  driver_version: 3.0.3
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.4.248-0404248-generic
  server: lxd
  server_pid: 4115
  server_version: 3.0.3
  storage: zfs
  storage_version: 0.6.5.6-0ubuntu28
  server_clustered: false
  server_name: sipl-4Xp
  project: ""

Issue description

I don't assign GPUs to the container, but all GPUs are displayed in the container

Steps to reproduce

  1. Step one :lxc exec xxx bash
  2. Step two : nvidia-smi
  3. Step three: I will see all the GPU

Information to attach

Log:

lxc tianjimiao 20201221090504.400 WARN conf - conf.c:lxc_setup_devpts:1616 - Invalid argument - Failed to unmount old devpts instance lxc tianjimiao 20201221090504.400 WARN apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - Incomplete AppArmor support in your kernel


 - [ ] Container configuration (`lxc config show NAME --expanded`)

sipl@sipl-4Xp:~$ lxc config show tianjimiao --expanded architecture: x86_64 config: image.architecture: x86_64 image.description: Ubuntu 16.04 LTS server (20180424) image.os: ubuntu image.release: xenial raw.lxc: lxc.apparmor.allow_incomplete=1 security.privileged: "true" volatile.base_image: 55c06c2c9b9e47fbb89537134395c12d221e1536d26788c07ee042d07b34dd07 volatile.eth0.hwaddr: 00:16:3e:de:3b:86 volatile.eth0.name: eth0 volatile.idmap.base: "0" volatile.idmap.next: '[]' volatile.last_state.idmap: '[]' volatile.last_state.power: RUNNING devices: A-Pool: path: /A-pool source: /A-pool type: disk eth0: nictype: bridged parent: br0 type: nic nvidia-uvm: path: /dev/nvidia-uvm type: unix-char root: path: / pool: default type: disk ephemeral: false profiles:

stgraber commented 3 years ago

Your config shows you're passing /dev/nvidia-uvm to the container, what happens if your remove that device?

Also, please show "find /dev" inside the container.