Closed jonahbohlmann closed 3 years ago
The issue could be solved.
I moved from /etc/target/backup/* a file to /etc/target/saveconfig.json.
In the current "saveconfig.json" was no "lun" entry. I was able to recover the backup config file with the lun entry:
"luns": [
{
"alias": "288488321c",
"alua_tg_pt_gp_name": "default_tg_pt_gp",
"index": 0,
"storage_object": "/backstores/user/block-volume"
}
],
Main question is the same: why are the luns removed from the config? Can this happen again? In logs, I saw that saveconfig is something generated. From where do the service get the information about the luns?
I am not a C developer, so it is hard for me to understand the code. But I think the place where the config is generated is this part: https://github.com/gluster/gluster-block/blob/4f994e3cfa440d9dae1317a1d2b03ed994a03ef4/rpc/block_genconfig.c#L65
But I don't understand where the data is coming from and why it is missing for the environment.
Also, I saw that sometimes the config will be regenerated. So restore a working version is just a "for the moment" solution but may not a general solution for my issue.
No idea? I have quite often the issue, that now the "luns" part are removed from saveconfig.json.
I now have to restore the saveconfig.json after everytime I restart "gluster-blockd" service. Why? What happend there?
Please support!
Thanks.
@jonahbohlmann the project is considered to be in maintenance-only status. I will consider adding this in the ReadMe doc soon. Please also expect slow replies to issues.
[STAGING] [17:52:24 root@fra1-glusterfs-m01]{~}>gluster volume info rdxarchive_2020 We recommend replica 3 volume with group profile applied on it. Helpful command: # gluster vol set
group gluster-block Main question is the same: why are the luns removed from the config? Can this happen again? In logs, I saw that saveconfig is something generated. From where do the service get the information about the luns?
You can refer to this as storage objects missing, these are backstores under user:glfs in targetcli list output. You will notice missing storage objects when you do reboots of the nodes or restart of the gluster-blockd service. In case if there are any issues from the backend block hosting glusterfs volumes the tcmu-runner fails to load them.
Please check your tcmu-runner.log gluster-blockd.log and other logs in the gulster-block directory.
The missing configurations in the /etc/target/saveconfig.json are because of a previous bug, if there were new create/delete requests when there few unloaded block volumes then you might hit this bug. We highly recommend you upgrade to 0.5 or 0.5.1, which has fixed this issue.
I am not a C developer, so it is hard for me to understand the code. But I think the place where the config is generated is this part:
Yes, there is a way to generate the missing config per node:
$ systemctl stop tcmu-runner gluster-blockd gluster-block-target
$ mv /etc/target/saveconfig.json /home/<backup>
$ gluster-block genconfig <BHV-comma-seperated-list> enable-tpg <local-ip> | tee /etc/target/saveconfig.json
$ systemctl start gluster-blockd
But I don't understand where the data is coming from and why it is missing for the environment.
The data comes from BHV's, there is a /block-meta
directory in every BHV where journal the per block volume metadata
Also, I saw that sometimes the config will be regenerated. So restore a working version is just a "for the moment" solution but may not a general solution for my issue.
Can you paste the logs?
No idea? I have quite often the issue, that now the "luns" part are removed from saveconfig.json.
I now have to restore the saveconfig.json after everytime I restart "gluster-blockd" service. Why? What happend there?
Suggest to check the logs from /var/log/.../gluster-block/
Before anything else:
On all the server nodes:
$ systemctl stop tcmu-runner gluster-blockd gluster-block-target
On one server node:
$ gluster vol stop rdxarchive_2020
$ gluster vol set rdxarchive_2020 group gluster-block
$ gluster vol start rdxarchive_2020
Good Luck!
Hello @pkalever,
thank you so much for you support and the detailed explanation. I got a better overview in how the stuff works.
I was able to create the RPM Package with version 0.5.1 and successfully installed the version on our test environment and production. At this moment, I see no more issues regarding the missing LUNs. So yes, the version upgrade and/or applying the profile fixed the issue.
Maybe you can attach the RPM files to the latest release because I think it would be very helpful. I was not able to find any repository with the latest version for CentOS 7.
Again, thank you.
Best Regards!
Assuming the issues are fixed, Closing this now, please feel free to open a new issue as needed.
Thanks!
Hello,
I am new to gluster and also gluster-block, and also I have no idea if this is a bug or just an issue on my side. I try to explain with all needed details and maybe someone can tell me if this is the right place, or I need to move it to some other place.
Content of post:
Introduction I have two gluster clusters. One for production and one as a test environment. The production instance is working fine since months. The test environment has an issue after 3 months. I just wanted to add as a note that the production environment has currently not the same problem. Below I am just talking about the test environment which is equal to the production.
I use iSCSI to mount the block volume into my software (which needs iSCSI as storage backend).
Informations
The gluster cluster has three nodes.
Versions:
Gluster Peer Status (from first server):
Gluster volume info:
Until this point I think everything is fine.
The issue
After some time of usage, I got an alert that the storage is no longer available. The gluster cluster itself is healthy, and I had not network issues or something similar tracked by my monitoring. I tried to reboot the nodes, services and checked logs. No information found about a real problem.
The software creator, where I use the iSCSI, told me that there is no LUN from the interface. So I checked with
targetcli ls
On production, I have this view:
So for me, I miss the "lun0" section in the test environment.
Any idea what can be the problem and how I can solve it? What else can I check for the problem? Any information missing in my post?
I hope someone has an idea.
Thank you!
Edit: It is also prossible to apply to my upwork job, I can pay you for support: https://www.upwork.com/jobs/~01867a80c9701e4070