Closed containerman17 closed 4 years ago
Please also post error report # 5D962D62-E3A0D-000000
, which contains details about when and where in the code the problem occurred.
Hi! I didn’t find a way how to look into this report. Is it stored on host machine somewhere?
@raltnoeder
ERROR REPORT 5D978DC5-81996-000000
============================================================
Application: LINBIT? LINSTOR
Module: Satellite
Version: 1.1.1
Build ID: fec52903884eb73c4e9e96260ab315989d27f207
Build time: 2019-09-10T13:10:42+00:00
Error time: 2019-10-04 18:22:05
Node: dockerize-eu2
============================================================
Reported error:
===============
Description:
Expected 3 columns, but got 2
Cause:
Failed to parse line: hdd-pool;10000667770880;
Additional information:
External command: vgs -o lv_name,lv_size,data_percent --units b --separator ; --noheadings --nosuffix vg-hdd/hdd-pool
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'getThinFreeSize', Source file 'LvmUtils.java', Line #250
Error message: Unable to parse free thin sizes
Call backtrace:
Method Native Class:Line number
getThinFreeSize N com.linbit.linstor.storage.utils.LvmUtils:250
getPoolFreeSpace N com.linbit.linstor.storage.layer.provider.lvm.LvmThinProvider:286
getFreeSpace N com.linbit.linstor.storage.layer.provider.StorageLayer:256
getStoragePoolSpaceInfo N com.linbit.linstor.core.apicallhandler.satellite.StltApiCallHandlerUtils:269
applyChanges N com.linbit.linstor.core.apicallhandler.satellite.StltStorPoolApiCallHandler:222
applyFullSync N com.linbit.linstor.core.apicallhandler.satellite.StltApiCallHandler:317
execute N com.linbit.linstor.api.protobuf.satellite.FullSync:95
executeNonReactive N com.linbit.linstor.proto.CommonMessageProcessor:522
lambda$execute$13 N com.linbit.linstor.proto.CommonMessageProcessor:498
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:146
lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:126
subscribe N reactor.core.publisher.MonoFlatMapMany:46
subscribe N reactor.core.publisher.Flux:7799
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:184
request N reactor.core.publisher.Operators$ScalarSubscription:2066
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:131
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.FluxOnAssembly:122
subscribe N reactor.core.publisher.FluxOnAssembly:122
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.FluxContextStart:49
subscribe N reactor.core.publisher.Flux:7799
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:389
slowPath N reactor.core.publisher.FluxArray$ArraySubscription:126
request N reactor.core.publisher.FluxArray$ArraySubscription:99
onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:335
subscribe N reactor.core.publisher.FluxMerge:69
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.Flux:7799
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
subscribe N reactor.core.publisher.FluxConcatArray:80
subscribe N reactor.core.publisher.FluxPeek:83
subscribe N reactor.core.publisher.FluxPeek:83
subscribe N reactor.core.publisher.FluxPeek:83
subscribe N reactor.core.publisher.FluxOnErrorResume:47
subscribe N reactor.core.publisher.FluxDefer:54
subscribe N reactor.core.publisher.Flux:7799
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:389
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:395
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:638
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:242
drainFused N reactor.core.publisher.UnicastProcessor:277
drain N reactor.core.publisher.UnicastProcessor:310
onNext N reactor.core.publisher.UnicastProcessor:386
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:216
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:162
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:398
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:484
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
It happened right after node reboot P.S. Confirmed. The error occurs only after reboot.
Ok. So LVM 2.02.176 on Ubunbtu shows empty field data_percent
. Temporary solution - to create a thin volume in every pool. More permanent one - change method getThinFreeSize
so it will handle hdd-pool;10000667770880;
as hdd-pool;10000667770880;0.00
.
root@xxx-eu2 ~ # vgs -o lv_name,lv_size,data_percent --units b --separator ";" --noheadings --nosuffix vg-hdd/hdd-pool
hdd-pool;10000667770880;
root@xxx-eu2 ~ # lvcreate -V1G -T vg-hdd/hdd-pool -n testvol
Using default stripesize 64.00 KiB.
Logical volume "testvol" created.
root@xxx-eu2 ~ # vgs -o lv_name,lv_size,data_percent --units b --separator ";" --noheadings --nosuffix vg-hdd/hdd-pool
hdd-pool;10000667770880;0.00
Hello
Thanks for the report as well as the research of the actual cause. Unfortunately I am still unable to reproduce this on my test system, even with rebooting the machine.
root@bravo:~/linstor# uname -a
Linux bravo 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
root@bravo:~/linstor# lvm version
LVM version: 2.02.176(2) (2017-11-03)
Library version: 1.02.145 (2017-11-03)
Driver version: 4.37.0
Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-clvmd=corosync --with-cluster=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-cmirrord --enable-dmeventd --enable-dbus-service --enable-lvmetad --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync
root@bravo:~/linstor# lsb_release -d
Description: Ubuntu 18.04.3 LTS
Regardless if my thin pool has volumes in it or not, the data_percent
is never missing (even after rebooting).
root@bravo:~/linstor# vgs -o lv_name,lv_size,data_percent --units b --separator ";" --noheadings --nosuffix
thin;104857600;0.00
root@bravo:~/linstor# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
thin scratch twi-a-tz-- 100.00m 0.00 10.84
root@bravo:~/linstor# lvcreate -V52M -T scratch/thin -n testvlm
Using default stripesize 64.00 KiB.
Logical volume "testvlm" created.
root@bravo:~/linstor# vgs -o lv_name,lv_size,data_percent --units b --separator ";" --noheadings --nosuffix
thin;104857600;0.00
testvlm;54525952;0.00
root@bravo:~/linstor# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
testvlm scratch Vwi-a-tz-- 52.00m thin 0.00
thin scratch twi-aotz-- 100.00m 0.00 10.94
I am sure we could implement your suggested permanent fix, although I would add an additional safety check, something like "if data_percent
is missing AND lvs
does not show any volumes in that pool". However, I still would like to know what I am missing that prevents me from reproducing this issue.
lvdisplay
say when your vgs
is missing the data_percent
? Especially interesting is the Allocated pool data
(or its absence)0.00
is actually something like 0.000001
, but I only test with very small thin pools. Your thin pool is about 10TB large, which might result due to rounding issues actually to 0.0
. That could be the mysterious difference when you imagine a check like if (data_percent){ ... }
inside the vgs
/ lvs
codeHi,
I faced this same issue, seems that there was a problem related to a missing package dependency (thin-provisioning-tools) in the system that prevent the system from mounting the thinpool device on boot. After installing the missing package I was able to mount the device and it's still there after reboot.
root@org1-1:/vagrant/linstor# lvm version
LVM version: 2.02.176(2) (2017-11-03)
Library version: 1.02.145 (2017-11-03)
Driver version: 4.37.0
Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-clvmd=corosync --with-cluster=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-cmirrord --enable-dmeventd --enable-dbus-service --enable-lvmetad --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync
root@org1-1:/vagrant/linstor# lsb_release -d
Description: Ubuntu 18.04.3 LTS
root@org1-1:/vagrant/linstor# lvdisplay
--- Logical volume ---
LV Name lvmthinpool
VG Name vg_storage
LV UUID 6JpP6A-fluG-F6pz-UgKL-X3uQ-vywY-HLrMeX
LV Write Access read/write
LV Creation host, time org1-1, 2019-12-04 20:27:14 +0000
LV Pool metadata lvmthinpool_tmeta
LV Pool data lvmthinpool_tdata
LV Status NOT available
LV Size 9.97 GiB
Current LE 2553
Segments 1
Allocation inherit
Read ahead sectors auto
root@org1-1:/vagrant/linstor# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvmthinpool vg_storage twi---tz-- 9.97g
root@org1-1:/vagrant/linstor#vgchange -a y
/usr/sbin/thin_check: execvp failed: No such file or directory
Check of pool vg_storage/lvmthinpool failed (status:2). Manual repair required!
0 logical volume(s) in volume group "vg_storage" now active
root@org1-1:/vagrant/linstor#apt-get -y install thin-provisioning-tools
root@org1-1:/vagrant/linstor# vgchange -a y
1 logical volume(s) in volume group "vg_storage" now active
root@org1-1:/vagrant/linstor# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvmthinpool vg_storage twi-a-tz-- 9.97g 0.00 10.61
root@org1-1:/vagrant/linstor# lvdisplay
--- Logical volume ---
LV Name lvmthinpool
VG Name vg_storage
LV UUID 6JpP6A-fluG-F6pz-UgKL-X3uQ-vywY-HLrMeX
LV Write Access read/write
LV Creation host, time org1-1, 2019-12-04 20:27:14 +0000
LV Pool metadata lvmthinpool_tmeta
LV Pool data lvmthinpool_tdata
LV Status available
# open 0
LV Size 9.97 GiB
Allocated pool data 0.00%
Allocated metadata 10.61%
Current LE 2553
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:4
Thanks for the info!
As this sounds reasonable enough for me I will close this issue for now. Feel free to reopen if there are still open questions.
Hi,
I got the same error today:
ERROR REPORT 5F4D45C7-B01E7-000005
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.8.0
Build ID: e56b6c2a80b6d000921a998e3ba4cd1102fbdd39
Build time: 2020-08-17T13:02:52+00:00
Error time: 2020-09-03 10:19:28
Node: xcp-ng-lab-1
============================================================
Reported error:
===============
Description:
Expected 3 columns, but got 2
Cause:
Failed to parse line: thin_device;246960619520;
Additional information:
External command: vgs --config devices { filter=['a|/dev/nvme0n1|','r|.*|'] } -o lv_name,lv_size,data_percent --units b --separator ; --noheadings --nosuffix linstor_group/thin_device
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'getThinFreeSize', Source file 'LvmUtils.java', Line #368
Error message: Unable to parse free thin sizes
Call backtrace:
Method Native Class:Line number
getThinFreeSize N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:368
getSpaceInfo N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:315
getStoragePoolSpaceInfo N com.linbit.linstor.layer.storage.StorageLayer:386
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:888
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1196
getStoragePoolSpaceInfo N com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:279
applyChanges N com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:235
applyChange N com.linbit.linstor.core.apicallhandler.StltApiCallHandler$ApplyStorPool:1338
applyChangedData N com.linbit.linstor.core.apicallhandler.StltApiCallHandler:756
applyStorPoolChanges N com.linbit.linstor.core.apicallhandler.StltApiCallHandler:690
execute N com.linbit.linstor.api.protobuf.ApplyStorPool:54
executeNonReactive N com.linbit.linstor.proto.CommonMessageProcessor:525
lambda$execute$13 N com.linbit.linstor.proto.CommonMessageProcessor:500
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$null$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8311
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.InternalFluxOperator:62
subscribe N reactor.core.publisher.FluxDefer:54
subscribe N reactor.core.publisher.Flux:8325
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:322
onNext N reactor.core.publisher.UnicastProcessor:401
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:266
access$201 N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:180
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:293
runWorker N java.util.concurrent.ThreadPoolExecutor:1149
run N java.util.concurrent.ThreadPoolExecutor$Worker:624
run N java.lang.Thread:748
END OF ERROR REPORT.
I use XCP-ng (Centos) here, the equivalent package of thin-provisioning-tools
is already installed. It's the device-mapper-persistent-data
RPM.
After deleting the devices, the problem has disappeared, quite strange. It's the first time that I see this problem. :thinking:
All I see from the ErrorReport is that Linstor got this line:
thin_device;246960619520;
Howevr, Linstor is actually expecting a 3rd column indicating a percentage of how much of the device is already in use. This information is required by Linstor in order to properly report those numbers back to the controller, thus to the user and plugins..
Hello,
I'm encountering the same issue on a fresh Debian 12 installation. The "percent" column is missing for all newly created thin volumes that haven't had any logical volumes created yet. Rebooting doesn't resolve the problem. In fact, whenever I create a new thin pool without any logical volumes, the lvm2 utilities don't display the "percent" column at all.
This issue can be addressed using the --reportformat json flag. In this case, the column is populated with zeros.
Installing the thin-provisioning-tools package doesn't seem to help in this specific scenario.
Hi guys! I am using Linstor with Piraeus. I have an error
Unable to parse free thin sizes
at/v1/nodes/xxx-eu1/storage-pools
.Here is log from node:
I am using imnage quay.io/piraeusdatastore/piraeus-server:v1.1.2 Some info from node startup:
So I found this error mentioned in satellite/src/main/java/com/linbit/linstor/storage/utils/LvmUtils.java:234
And I found a command here: satellite/src/main/java/com/linbit/linstor/storage/utils/LvmCommands.java:324
One the compute node I executed:
Looks like data_percent is empty for some reason. What can I do with it? How can I fix it?
Compute node is bare-metal Ubuntu 18.04.3 LTS server with physical nvme and hdd disks.