Closed gribchenko closed 2 years ago
what is the storage pool type? please can you show output of lxc storage show <pool>
Is it an LVM thinpool?
lxc storage show default
config: rsync.compression: "false" source: /var/lib/lxd/storage-pools/default description: Default DIR storage backend name: default driver: dir used_by:
New update...
Problem exist when node have more than 256 containers. If less. Everything ok.
Are there any warnings/errors in the logs? /var/snap/lxd/common/lxd/logs/lxd.log
?
https://github.com/lxc/lxd/blob/master/lxd/db/instances.go#L335
there aren't any errors in lxd.log lxc query -X GET --wait /1.0/containers?recursion=2 output with <256 containters
{
"architecture": "x86_64",
"backups": null,
"config": {
"boot.autostart": "true",
"image.architecture": "x86_64",
"image.description": "Debian 11 (Bullseye) eVPS containe r",
"image.os": "debian",
"image.release": "bullseye",
"security.idmap.base": "64648500",
"user.fqdn": "xxx",
"volatile.base_image": "845ba18a6d01676d020e5c6c15d52c75 ab9e0ef87fa84f4d025ad508be6ab32b",
"volatile.cloud-init.instance-id": "4cb26631-264b-4151-b 066-0dc5a03d8e29",
"volatile.eth0.host_name": "lxd2fcd84e6",
"volatile.eth0.last_state.created": "false",
"volatile.idmap.base": "64648500",
"volatile.idmap.current": "[{\"Isuid\":true,\"Isgid\":fa lse,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\ ":true,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536}]",
"volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false ,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":t rue,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536}]",
"volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\" :false,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isg id\":true,\"Hostid\":64648500,\"Nsid\":0,\"Maprange\":65536}]",
"volatile.last_state.power": "RUNNING",
"volatile.uuid": "e8cdea3e-4894-4d47-8209-9dead7316aa4"
},
"created_at": "2022-05-09T12:31:32.758011665Z",
"description": "",
"devices": {
"eth0": {
"ipv4.address": "xxx",
"ipv4.host_table": "4209",
"ipv6.host_table": "6209",
"mtu": "1500",
"name": "eth0",
"nictype": "ipvlan",
"parent": "vlan209",
"type": "nic"
}
},
output with 256+ containers:
},
{
"architecture": "x86_64",
"backups": null,
"config": null,
"created_at": "2022-05-04T14:56:08.115232574Z",
"description": "",
"devices": {},
"ephemeral": false,
"last_used_at": "2022-07-28T07:31:22.079843942Z",
"location": "none",
"name": "ct9924",
"profiles": [],
"project": "default",
"snapshots": null,
I checked query optimization, because 5.3 instance list was very slow, sql queries use IN (instanceList).....
output of: lxc query -X GET --wait /1.0/containers/ct9924?recursion=2 is correct
Interesting. I looked at the sqlite default limits for query placeholders:
https://www.sqlite.org/limits.html
See point 9:
the maximum value of a host parameter number is SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.
So we should be OK, unless this is a dqlite limitation cc @MathieuBordere
I would have expected to get a query error too if we were exceeding the limit somehow.
we just generated 255 empty containers and adding and removing one container, to reproduce bug.
Thanks, I'll try to reproduce here, but initially this feels like it could be a dqlite bug. Lets see.
as a side affect on reboot with 256+ containers: "boot.autostart": "true" is invisible and we have to start containers manually
Yeah its not good. I think a workaround is to build the IN statement manually without using query placeholders. Working on this now.
I'm going to test up to 512 instances.
Got a potential fix ^
@MaximMonin I've tested my fix up to 512 instances and it seems to work fine.
we are going to rebuild now our new debian package with this hotfix, and test it
we are going to rebuild now our new debian package with this hotfix, and test it
Thanks!
@MaximMonin you only need https://github.com/lxc/lxd/pull/10706/commits/a0a49a91a6a8702cf645675daaac9f3175fb9c8b the others are additional cleanup/improvements I noticed.
we rebuilded debian package with 5.4.tar.gz as source and patched /lxd/db/instances.go It seems lxc list and lxc query are working ok now.
Thanks!
Excellent thanks for testing.
Required information
The output of "lxc info" or if that fails: driver: lxc driver_version: 5.0.0 kernel_version: 5.4.0-100-generic
server_version: "5.4" storage: dir storage_version: "1" storage_supported_drivers:
Issue description
After upgrade to 5.4 some ct don't have USER FQDN and DISK USAGE in output of lxc list -c p,user.fqdn,nD +---------+---------------------+--------+------------+ | PID | USER FQDN | NAME | DISK USAGE | +---------+---------------------+--------+------------+ | 1733 | es206..net | ct3029 | 40.29GiB | +---------+---------------------+--------+------------+ | 4248 | es360..net | ct3093 | 23.83GiB | +---------+---------------------+--------+------------+ | 5682 | vs2141..net | ct3095 | 1.80GiB | +---------+---------------------+--------+------------+ | 7414 | vs2142..net | ct3105 | 13.31GiB | +---------+---------------------+--------+------------+ | 12161 | vs282..net | ct3141 | 7.50GiB | +---------+---------------------+--------+------------+ | 13615 | vs2144..net | ct3160 | 5.45GiB | +---------+---------------------+--------+------------+ | 15423 | vs2145.*.net | ct3163 | 2.63GiB | +---------+---------------------+--------+------------+ | 18865 | | ct3179 | | +---------+---------------------+--------+------------+ | 20697 | | ct3271 | | +---------+---------------------+--------+------------+ | 22448 | | ct3286 | | +---------+---------------------+--------+------------+ | 23574 | | ct3310 | | +---------+---------------------+--------+------------+ | 24547 | | ct3360 | | +---------+---------------------+--------+------------+ | 27561 | | ct3377 | | +---------+---------------------+--------+------------+ | 28742 | | ct3381 | |
Steps to reproduce
Information to attach
dmesg
)[ ] Container log (
lxc info NAME --show-log
)lxc info ct3179 --show-log Name: ct3179 Status: RUNNING Type: container Architecture: x86_64 PID: 18865 Created: 2021/01/24 18:25 EET Last Used: 2022/02/24 03:43 EET
Resources: Processes: 106 Disk usage: root: 8.29GiB CPU usage: CPU usage (in seconds): 317023 Memory usage: Memory (current): 893.89MiB Memory (peak): 2.11GiB Network usage: eth0: Type: broadcast State: UP Host interface: vlan210 MAC address: 76:d8:b6:9b:67:c9 MTU: 1500 Bytes received: 45.78GB Bytes sent: 6.11GB Packets received: 38740154 Packets sent: 39084301 IP addresses: inet: ..*.129/32 (global) lo: Type: loopback State: UP MTU: 65536 Bytes received: 2.20GB Bytes sent: 2.20GB Packets received: 9749391 Packets sent: 9749391 IP addresses: inet: 127.0.0.1/8 (local)
Log:
[ ] Container configuration (
lxc config show NAME --expanded
)lxc config show ct3179 --expanded architecture: x86_64 config: boot.autostart: "true" limits.cpu: "6" limits.hugepages.1GB: 4GiB limits.hugepages.2MB: 100MiB limits.memory: 16GiB limits.memory.enforce: hard limits.memory.swap: "false" limits.processes: "800" raw.lxc: lxc.cgroup.memory.oom_control=1 security.idmap.base: "20822450" security.idmap.isolated: "true" user.fqdn: vs1870.XXXXX.net volatile.eth0.host_name: lxdba9f5fa7 volatile.eth0.last_state.created: "false" volatile.idmap.base: "20822450" volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":20822450,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":20822450,"Nsid":0,"Maprange":65536}]' volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":20822450,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":20822450,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":20822450,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":20822450,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING volatile.uuid: 96db4629-ad89-4ec4-89cc-25b8ffc6d41e devices: eth0: ipv4.address: ... ipv4.host_table: "4210" mtu: "1500" name: eth0 nictype: ipvlan parent: vlan210 type: nic resolvconf: path: /etc/resolv.conf readonly: "true" source: /etc/resolv.conf type: disk root: path: / pool: default size: 256GiB type: disk shared: path: /shared readonly: "true" source: /shared type: disk ephemeral: false profiles:
[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of
lxc monitor
while reproducing the issue)