dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
277 stars 132 forks source link

Broken revert patch has introduced new bugs #7517

Closed paulmillar closed 4 months ago

paulmillar commented 4 months ago

Commit ce3927d71a4 (RB 14224) claims to revert commit c0c11f4aa2.

However, commit ce3927d71a4 actual does more than simply reverting c0c11f4aa2. It introduces other changes.

The ce3927d71a4 commit message does not make it clear that additional changes are included. Instead, the commit message suggests (to me) that change should only to revert commit c0c11f4aa2. Therefore, I believe these additional changes are unintentional.

One of the additional changes introduced by commit ce3927d71a4 is a modification to the method listPoolGroupXml

                           + groupName);
                 }

-                Object[] result = new Object[5];
+                Object[] result = new Object[4];
                 result[0] = groupName;
                 result[1] = group._poolList.keySet().toArray();
                 result[2] = group._linkList.keySet().toArray();
                 result[3] = group.isPrimary();
-                result[4] = group._pgroupList.stream().sorted(comparing(PGroup::getName))
-                      .map(PGroup::getName).toArray();
                 xlsResult = result;
             }
         } finally {

Note that commit c0c11f4aa2 makes no corresponding change to the listPoolGroupXml method.

The result of the above change to the listPoolGroupXml method is that the psux ls pgroup admin command (send to PoolManager) now returns an array of four objects, instead of an array of five objects. This, in turn triggers the info service to log repeated messages like:

23 Feb 2024 23:07:51 (info) [ListBasedMessageDga[PoolManager, poolgroups, psux ls pgroup]] Pool group info, unexpected array size: 4

This discrepancy may have additional consequences (in httpd, for example).

Please note that I have not make an exhaustive examination of the problems introduced by commit ce3927d71a4. The problem identified above (the array size changing) triggered an obvious problem, and the reason I discovered the problematic commit. There may be other unintentional changes introduced by commit ce3927d71a4.

Therefore, I suggest a very careful examination is made to understand the scope of the problem.

lemora commented 4 months ago

Thank you for catching this, Paul. I went through the commit on master and the backported one in 9.2, and the latter seems to be fine; that is, merely revert the changes in commit c0c11f4a.

I suspect that the additional changes were introduced in the master commit due to an unfortunate 9.2-based cherry-picking sequence and faulty merge conflict resolution. I believe that the problem you caught is the only discrepancy now that needs fixing, and the issue only exists in master.

There is a fix on Review board (14227) to address the problem and restore master to purely contain changes due to the intended revert of c0c11f4a. It would be great if you could review the patch.

paulmillar commented 4 months ago

Hi Lea,

I double-checked the change back-ported to 9.2. You're right: that does look OK. :+1:

It looks like you've already committed another patch (commit 4044301494) that partially addresses the problem (but seemingly not 100% correctly).

So the patch on RB is both fixing aspects of 4044301494 and also the unintentional changes in ce3927d7.

To be honest, the situation is now sufficiently confusing that I cannot be sure that the patch on RB is complete and correct.

Instead, I suggest you ask someone (my suggestion would be Tigran) to review what you are doing ("review the procedure") rather than reviewing the actual code changes, trusting git knows how to revert patches correctly. (It doesn't hurt to double-check the patches, though.)

With this in mind, I would suggest the following steps:

  1. Use git revert to create a new patch that backs out commit 4044301494. Commit this to your local master branch.
  2. Use git revert to create a new patch that backs out commit ce3927d7. Commit this to your local master branch.
  3. Use git revert to create a new patch that backs out commit c0c11f4aa2. Commit this to your local master branch.
  4. Push your local master branch to the github repo.

I believe this should yield the desired result.

My suggestion would be to do this while Tigran is "shoulder surfing", double-checking what you're doing and acting as a live-mode code-reviewer.

Have a chat with Tigran. If he's happy with the patch on RB, the please go ahead. The above strategy is just a suggestion.

All the best, Paul.

lemora commented 4 months ago

That is what I did in the end in order to make sure the patch on RB is correct, actually:

  1. Revert the partial fix
  2. Revert the broken revert itself (-> reapply)
  3. Properly revert "poolmanager: delete property for switchingon/off caching for psu"

Then an interactive rebase of these three commits resulted in a single commit. To make sure, I repeated the procedure just now: the > git diff HEAD~3 HEAD changes are on RB.

I'll definitely chat with Tigran, though.