chjj / compton

A compositor for X11.
Other
2.24k stars 501 forks source link

Shadow bug #31

Closed corenominal closed 12 years ago

corenominal commented 12 years ago

Apologies for the poor bug title, but I am not sure how best to describe this issue. Anyhow, running Compton under Debian Wheezy, I am experiencing a strange shadow bug, whereby a window shadow stays on top of all other windows. When this happens, the shadow always appears at the same coordinates and appears to always have the same dimensions. Please see the following screenshot:

Shadow bug

At first, the bug appeared to be happening at random, when opening an application, but I can now replicate the bug quite consistently by following these steps:

  1. Launch VLC
  2. Close VLC
  3. Open File Roller (I use File Roller as an example, but most GTK+ 3 applications seem to trigger bug)

Given some instruction, I would be more than happy to help debug this.

Apart from this issue, Compton has been working perfectly, thank you :)

ichi-no-eda commented 12 years ago

I experience exactly the same phenomenon on Arch Linux with Openbox. Though I don't think I could describe a reproducible pattern.

But where the 'shadow' is now I had a blank space (meaning a square where open windows where invisible) exactly the same size when I used xcompmgr.

Unfortunately I'm no coder so can't help with that but would be extremely happy to see this be gone as compton works very well for me too except for this issue (and I am quite dependent on compositing:-).

Thanks to the developer!

chjj commented 12 years ago

Sorry, I've been busy lately. I'll look into this. It is strange though. I've used openbox for ages and never seen this.

ichi-no-eda commented 12 years ago

That's nice! Thanks. And really - no need to apologize. I'm just glad someone looks into this and keeps a decent compositing-manager for Openbox alive.

corenominal commented 12 years ago

@chjj thank you for looking into it. As mentioned above, if there is anything I can do to help, please let me know.

ichi-no-eda commented 12 years ago

Confirmed: -launch VLC -close VLC -open GTK-application causes the bug.

ichi-no-eda commented 12 years ago

It only happens when VLC uses native style (GTK+). Using a custom skin with VLC avoids the bug.

chjj commented 12 years ago

I'm able to reproduce it, which is good. Now I just need to figure out whats wrong. It looks like a destroyed window might not be completely removed from the linked list. It only ends up painting a shadow for a window that doesn't exist.

corenominal commented 12 years ago

@chjj howdy :) Sorry for the bump, but I am interested to know if you have had any movement or breakthroughs on this?

chjj commented 12 years ago

@corenominal, I've pinpointed the exact line causing the problem, but now the question is why. It seems to be trying to paint a window that doesn't exist anymore, so instead it only manages to paint a shadow for the non-existent window. I've long suspected that there might be problems with the way the global linked list is managed. I'll have to look into it some more.

Can anyone find anymore programs that cause this, aside from VLC?

corenominal commented 12 years ago

@chjj, brilliant, this sounds very positive. As I mentioned in my original post, apart from this issue, it has been working perfectly and I can foresee Compton becoming the de facto compositor for Openbox users like myself. Keep up the great work and again, if you need any assistance, please let me know.

P.S. I shipped Compton with the first Debian Wheezy CrunchBang builds and it has been well received. I have tried to keep an eye on any further reports of strange behaviour/bugs, but so far this one seems to be the only real issue.

ichi-no-eda commented 12 years ago

I've been looking for a pattern during the last few weeks but couldn't find one. The only thing I noticed was that Geany relatively often causes the bug (one time out of 50 times I use Geany, I'd say) - again I couldn't pin it down to a reproducible pattern. One time obmenu triggered the bug - but again: not reproducible -sorry.

akmur commented 12 years ago

I get it too, what detail do you need if any? I replicated the issue by doing what ichi-no-eda says

richardgv commented 12 years ago

If you plan to turn your head away because of the length of this reply -- I provide possible fixes for the issue. I'm not here just to describe that I encountered the problem, too, with so many letters.

If you don't me mind repeating again that my knowledge about Xlib is close to none. This problem is so obscure and complicated that chances are my guesses below are totally incorrect. And it's different from what the author (chjj) has discovered -- not a single bit similar, should I say. Essentially I'm not confident about my conclusions, and I'm not entirely sure if my patch actually resolves the issue -- this bug is probably closely tied to timing, so it's hard to determine if it actually goes away.

I'm quite uncertain about the actual cause of this issue. I found two solutions from different aspects and they both seem to work. But neither explanation completely explains everything. I guess the second one should be more technically correct. Either patch should be sufficient to fix the bug, I just don't know which one is more correct.

As I'm very likely wrong in the steps I've taken, I would prefer to write a list of the things I did and discovered, to make it easier to spot where I am wrong.

  1. The first step, well, I reproduced the bug successfully on my fvwm, by launching qvlc, closing it, then launching file-roller.
  2. I turned DEBUG_REPAINT on so I could see what windows compton paints, and I discovered that although I have only file-roller and two other windows open, compton attempted to paint 4 windows. xwininfo tells me the wrong window is the actual window of file-roller. (fvwm wraps an actual application window in a frame window, from my understanding, and compton should only paint shadow for the frame window instead of the actual application window.)
  3. I compared the paint_all() process of compton and xcompmgr (which seemingly does not have this issue -- it actually does, mentioned later) using gdb. xcompmgr did not paint the inner actual window of file-roller because the window's w->damaged is set to 0, while in compton it's 1.
  4. There are two places in compton that sets w->damaged to 1, one in set_fade(), another in repair_win(). Commenting out the former one did not help, so the problem is in repair_win(). repair_win() is only called by damage_win(), so I presume the problem here is one extra DamageNotify event is received by compton.
  5. I tried turning DEBUG_EVENTS on in compton, and the problem vanishes under my nose. As DEBUG_EVENTS does not change the logic of the program, I made a guess that the bug is somehow tied to timing.
  6. By adding some logging code near the calls to XDamageCreate() and XDamageDestroy() in compton and inside damage_win(), I determined that the incorrect DamageNotify event was emitted by a Damage that has just been XDamageDestroy()-ed. That damage was added on the CreateNotify of the actual file-roller window, and was destroyed on ReparentNotify, when fvwm created a frame window and wrapped the actual file-roller inside the frame window.
  7. I added the same debug code to xcompmgr, and found it didn't receive the same problematic DamageNotify event. At this point, my guess is the DamageNotify event is added to some sort of X event queue at a moment before XDamageDestroy() is called on the Damage, so it's left in the queue even after the Damage was destroyed. That fits to my guess that the problem is related to timing.
  8. In order to confirm my guess I added sleep(1) after the call of XDamageDestroy() in xcompmgr, and the very issue appeared on xcompmgr, too! So I confirmed it's related to timing, and it's bug inherited from xcompmgr.
  9. So, the solution that I came up at this point was to block damage_win() when the Damage reported by the DamageNotify event is not longer present. Oh, and looks like it's working.

    diff --git a/src/compton.c b/src/compton.c
    index 60948ca..13ab207 100644
    --- a/src/compton.c
    +++ b/src/compton.c
    @@ -1848,12 +1848,25 @@ destroy_win(Display *dpy, Window id, Bool fade) {
      }
    }
    
    +static win *find_damage(Damage dmg) {
    +  win *w;
    +
    +  for (w = list; w; w = w->next)
    +    if (dmg == w->damage)
    +      return w;
    +
    +  return NULL;
    +}
    +
    static void
    damage_win(Display *dpy, XDamageNotifyEvent *de) {
      win *w = find_win(dpy, de->drawable);
    
      if (!w) return;
    
    +  if (!find_damage(de->damage))
    +    return;
    +
    #if CAN_DO_USABLE
      if (!w->usable) {
        if (w->damage_bounds.width == 0 || w->damage_bounds.height == 0) {
  10. I thought I could simplify it. As the win / struct _win retains a reference to the Damage of the window as w->damage, and the value is set to None (0) after its Damage is XDamageDestroy()-ed, I could reduce the whole find_damage() call to:

    if (!(w->damage)) return;

    And I'm so surprised that it didn't work.

  11. Looking into the window linked list with gdb, I discovered that w->damage actually has a value when its Damage should have already been destroyed. Further investigate revealed, the underlying reason is in the window linked list list, there is more than one window with identical ID! But they have different damage values, one has None, the correct value after a Damage is XDamageDestroy()-ed, another has a weird value.
  12. Thus I added extra debugging code to add_win(), and it revealed, VLC firstly created a window, sending a CreateNotify, then it reparented the window twice, with the root window as the parent window, sent two ReparentNotify. As compton/xcompmgr internally calls add_win() when it receives a ReparentNotify with the root window as the parent, compton/xcompmgr would add the same window to the window linked list for 3 times, each time it calls an XDamageCreate, creating 3 Damage-s for the same window. VLC later destroyed the window, and one entry in the window list was removed, its Damage destroyed, but two entries for the window ID in the window list and two Damage-s for the nonexistent window remained. When I launched file-roller, its actual window uses the same ID as the window VLC just destroyed, so compton/xcompmgr added yet another Damage and another entry in the window list, then freed that Damage and the entry when fvwm reparents the window inside a frame window. Then, two entries are left in the window list, with different Damage IDs. This possibly puzzled X DAMAGE extension. So the key to the problem might be preventing the same window from being added twice, thus my second patch:

    diff --git a/src/compton.c b/src/compton.c
    index b81117d..0211748 100644
    --- a/src/compton.c
    +++ b/src/compton.c
    @@ -1578,6 +1578,11 @@ add_win(Display *dpy, Window id, Window prev, Bool override_redirect) {
    
       if (!new) return;
    
    +  if (find_win(dpy, id)) {
    +    free(new);
    +    return;
    +  }
    +
       if (prev) {
         for (p = &list; *p; p = &(*p)->next) {
           if ((*p)->id == prev && !(*p)->destroyed)
  13. The thing it does not explain is why the DamageNotify that eventually causes the problem reports the Damage ID of the last Damage requested for the window ID (which has just been destroyed) instead of the Damage IDs of either the two old Damage incorrectly created and never XDamageDestroy()-ed. Is it a characteristic of X DAMAGE extension or the library when handling multiple Damage for the same window, or is my guess entirely wrong?
  14. Another thing that does not explain is why the problem is associated with timing. I checked the debugging output of xcompmgr when the problem does not appear. Although it created 4 Damage for the same window ID (2 ghost ones remaining), it did not receive the DamageNotify that causes the issue directly. Only by adding a sleep(1); can the problem be reproduced.
  15. As reference, here's the debugging output of a normal xcompmgr, a buggy xcompmgr, a buggy compton, and a compton with patch #2 applied: https://gist.github.com/3672309

@chjj: By the way, I wonder what is remaining in the 3 major bugs that needed fixing that you mentioned?

chjj commented 12 years ago

Well, I'll start with the last first. The last major bug is a bug, present in the original xcompmgr, which manifests as a black rectangle when Chromium is opened and maximized. I've seen it in openbox at least. It's not a terrible bug, but it is very annoying and confusing the first time you see it. I believe it has something to do with the way Chromium avoids having the window manager draw window decorations. The bug is not present in Chromium when you turn on regular window decorations.

Thanks for all your great work, richardgv. I'll be messing around with this. Unfortunately, I can't reproduce this bug in awesome, so I'll have to login to openbox to test. I'll see what I come up with. I'm anxious to get this fix implemented finally.

chjj commented 12 years ago

Both fixes definitely work. This has to be one of the most baffling bugs I've seen. For now, I'm going to add the find_win check, because it couldn't hurt to have a safety net there in the first place. That will fix things for now, and I'll leave this issue open in case we find out more in the future.

ichi-no-eda commented 11 years ago

Thanks for all involved for fixing this bug. Very much appreciated!