Closed pkalever closed 4 years ago
@lxbsz do you foresee any consequences with this fix? Thanks!
@lxbsz do you foresee any consequences with this fix? Thanks!
To fix the issue you mentioned IMO the change In gluster-block should be okay, but in tcmu-runner's tcmu_glfs_open(), it seems there still will be a problem:
556 dev_size = tcmu_dev_get_num_lbas(dev) * block_size;
557 if (st.st_size != dev_size) {
558 /*
559 * The glfs allows the backend file size not to align
560 * to the block_size. But the dev_size here in tcmu-runner
561 * will round down and align it to the block_size.
562 */
563 if (round_down(st.st_size, block_size) == dev_size)
564 goto out;
565
566 if (!reopen) {
567 ret = -EINVAL;
568 goto close;
569 }
570
571 /*
572 * If we are here this should be in reopen path,
573 * then we should also update the device size in
574 * kernel.
575 */
576 tcmu_dev_info(dev,
577 "device size and backing size disagree:device %lld backing %lld\n",
578 dev_size, (long long) st.st_size);
579
580 ret = tcmur_dev_update_size(dev, st.st_size);
581 if (ret)
582 goto close;
583 }
If a remote node resize failed, the remote node's saveconfig.json should be still holding the old size. If the node is rebooted, the 'reopen = false' it should fail here.
We could just remove the reopen check in tcmu-runner IMO.
@lxbsz I think yes, you discovered a new issue that needs to be fixed in tcmu-runner, but xiubo I do not see any correlation between this fix and the issue you mentioned that exists in tcmu-runner, agree? I mean, even without this change, since we resize the backend first before everything and only then go for modifying config on remote nodes, this is an issue with tcmu-runner which already exist even before this patch.
But the good thing with this gluster-block fix is that even if the remote node failed, if someone do an upgrade, then genconfig will fix the saveconfigs on all the nodes, because SIZE is updated in metafile, else before an upgrade say on one node resize failed, then after performing an upgrading all the nodes saveconfig.json will have oldSize and tcmu-runner will fail to load the block on all the nodes because the size mismatch between saveconfig and backend file size.
Thanks!
@lxbsz I think yes, you discovered a new issue that needs to be fixed in tcmu-runner, but xiubo I do not see any correlation between this fix and the issue you mentioned that exists in tcmu-runner, agree? I mean, even without this change, since we resize the backend first before everything and only then go for modifying config on remote nodes, this is an issue with tcmu-runner which already exist even before this patch.
Yeah, that's right.
But the good thing with this gluster-block fix is that even if the remote node failed, if someone do an upgrade, then genconfig will fix the saveconfigs on all the nodes, because SIZE is updated, else, even if on one node resize fails after upgrade all the nodes saveconfig.json will have oldSize and tcmu-runner will fail to load the block on all the nodes.
Agree.
So in tcmu-runner we can just remove the 'reopen' check, or just give some warning logs.
Thanks!
@lxbsz thanks Xiubo. Updated the tags, will merge soon after CI returns success.
What does this PR achieve? Why do we need it?
On a resize operation, say the backend file expand succeed but the config change on a remote node failed. In this case, the block hosting volume size is already occupied. Hence the block volume info should return the currently consumed size.
Signed-off-by: Prasanna Kumar Kalever prasanna.kalever@redhat.com
Notes for the reviewer
Needed-by: https://github.com/heketi/heketi/pull/1702