Closed mriswyth closed 8 years ago
I think this is indeed a valid issue. Thanks Bryan.
proposed fix pushed to branch githubissue#62. Will be testing branch. @davidker
FYI: This was also pointed out by Thomas Gleixner tglx@linutronix.de during his comments from 6/1.
Because tglx pointed this issue out when we attempted to push my fix for github #58 (in particular, commit 726b41175e4908d057963da211da7dab49ae588b), I developed a version of this patch that fits on out-of-staging-visorbus-v2, and pushed it as commit a7642b676b0393ab6a05eefe9e99fa9a12ae672d. This compiles clean, but I haven't had a chance to regression test it yet. I will do that tonight.
Of course there is a new version of checkpatch now, reporting new errors in visorinput, but they appear to be all false-positives, and are unrelated to my patch.
Oh no. Commit a7642b676b0393ab6a05eefe9e99fa9a12ae672d does NOT work. System won't boot with it. Reason is, changing to a mutex apparently reveals a latent defect (that I have fixed in for-later with commit dfa68cfa40a9af4783cad2306bcf5f45e1efe89c - "staging: unisys: visorinput: ensure proper locking wrt creation & ints" for github #58 in for-later), involving the lock being used before it is initialized. I.e., apparently we were "forgiven" for this mistake when it was a semaphore, but are NOT so lucky when converted to a mutex.
I'm getting dizzy from all these up-in-the-air patches.
I'll figure out how to fit this into our current patchdeck... it will probably mean we have to include both commit dfa68cfa40a9af4783cad2306bcf5f45e1efe89c followed by a modified version of commit a7642b676b0393ab6a05eefe9e99fa9a12ae672d.
OK... my 2 patches on out-of-staging-visorbus-v2-selltc-20160602 (based on out-of-staging-visorbus-v2 as of a few minutes ago) work ok:
These patches regression-tested clean, and are checkpatch clean.
So if you could add the above 2 patches for your v3 submission, that would address tglx's 6/1 review comments for visorinput.
We decided NOT to include the above 2 fixes in the v3 patchset, so I sent the following explanation to tglx on 6/3:
+ down_write(&devdata->lock_visor_dev);
Why is this a rw_semaphore? It's only ever taken with down_write() and it's always the same context. Should be a mutex, right?
Correct. We have a local patch that addresses this, but would like to submit this via a follow-on patchset if possible. I'll explain.
Rationale: our intent for this patchset was to focus on the visorbus driver ONLY. The only reason visorinput got involved in the first place was due to the visorbus change that necessitated that we remove the locking from visorinput_channel_interrupt(), due to that now being called from atomic context.
If the semaphore --> mutex change would have been as simple as it sounds, we would have had NO problem including it with the next version (v3) of this patchset. But unfortunately, this change uncovered a latent defect, which necessitated yet another patch. (I know... hard to believe that something this simple would do that, but it did.) Rather than further complicating this patchset, we thought it would be better to address the visorinput issues via a separate follow-on patchset.
Is that acceptable for you?
While at it, please convert the notifier_lock to a mutex as well.
Thanks. Since this is visorbus-specific, we DO plan to address this in v3 of this patchset, which will most-likely just be REMOVING notifier_lock altogether. (Note, this is actually part of the github #69 fix).
@selltc, Tim do you have a cover-letter you want sent?
I'll have a cover letter in a few minutes. But it also just occurred to me that I have not actually run these patches independent of the patches we already have in-flight for visorbus, so I will need to do that if we plan to submit before the in-flight stream is accepted.
They require the patches in flight, they do no apply to greg's tree. So I will either need to make a v5 of the series and add them or get tglx approval to do them later.
It is my impression that tglx would be ok with us doing them later, as his most-recent suggestion was "please fix before visorinput is moved out of staging". That's my vote.
Proposed cover letter:
fix race condition in visorinput dev init, convert semaphore to mutex
This patchset fixes a race condition in visorinput device creation that
would allow the associated input device to be opened and interrupts to
be enabled before we finished initializing the device structure. We
discovered this race condition when we attempted to convert
visorinput's semaphore to a mutex, which revealed (via a crash) that we
were attempting to access the lock in the visorinput device structure
before it was actually initialized.
The 2nd patch in the series converts visorinput's semaphore to the
more-appropriate mutex.
8/15 - Greg committed these patches into staging-next with:
This may require some more thinking, but I noticed it while working on issue #55.
Community feedback resulted in issues #55 and #56 that on the surface seem like they would apply to lock_visor_dev as well since the write portion of the rw_semaphore is a binary semaphore/mutex and the semaphore is only the write semaphore calls are made.