Closed msheldyakov closed 2 years ago
From the stacktrace, the error is about a missing entry in the database. Linstor tries to load the resources with their volumes but does not find the expected DB entries in the corresponding DRBD CRD.
If you send me a database dump (either manually export all Linstor CRDs or use linstor sos-report download
and send me the file) I might be able to tell more.
Otherwise it would certainly help to know what happened before the reboot. Any issues, any strange behavior during resource creation / deletion?
Here is a backup of the CRD resources https://drive.google.com/file/d/1x-vQjtgbJfvdCVrxm7zs_L_-zUfTl8zE/view?usp=sharing
Any issues, any strange behavior during resource creation / deletion?
Can't choose one problem, this is a test bench where we checked for failures. Next time I will write down a detailed log of actions.
My bad, shared a file. https://drive.google.com/file/d/1x-vQjtgbJfvdCVrxm7zs_L_-zUfTl8zE/view?usp=sharing
I had no time to experiment, but is it possible that you once forgot to create volume-definitions for a resource (namely PVC-3DCE9E04-5B7F-4CC1-AFCB-B466D03524DC
) ?
If so, then I might understand the issue without having dug deeper into it. If not.. all I can say right now is that the database contains entries that the resource exists, the resource has DrbdRscData (which also exists), but those DrbdRscData should have DrbdVlmData entries which do not exist (that is what the error message complains about).
Usually we do not recommend modifying the database manually, but I have to understand the actual problem better before I can come up with a proper solution (without having to ask you to temper with the database manually).
but is it possible that you once forgot to create volume-definitions for a resource?
I did not create volume definition manually. Everything was created automatically with default piraeus-operator setup, via linstor-csi.
Usually we do not recommend modifying the database manually
This is a test cluster specifically for the purpose of testing k8s as a linstor store. There is no problem with data loss, cluster recovery is not required. My only intention is to leave bug reports to improve the linstor as a product.
If the right place for feedback on this is the Piraeus operator repository - please let me know.
Thank you for the information. However, right now I'd need more information to continue investigating.. Of course ideal would be some kind of reproducer, but I do not assume you have one, or the time finding one. I will try to keep this in my radar, but cannot promise currently anything.
After several days of test use, the controller reboot ended in the inability to load resources. Installation via piraeus v1.7.0-rc.2.
Controller log:
linstor err show 619BC20D-00000-000000