Closed selltc closed 7 years ago
staging-next
-based fix for this is commit 6ee6e503653a603880a4c1bec306fabb90aa5229.
This problem also exists in the for-later branch, but I'm not going to worry about that right now. It is the netif_napi_add() that precedes the starting of the polling loop or enabling of interrupts that we need to keep; the other one is extraneous and must be deleted.
Okay, I think this is serious enough that it should probably be in the list that I'm sending up. Though I'm not going to hold the current dry-run for it since I'm about done with that. When we find changes I will post it with the rest tonight/tomorrow.
Thanks David. The main seriousness of this in my mind is really the possibility that there are hidden side-effects during normal operation caused by the double-netif_napi_add()
.
I have this fix integrated into my for-later-proposed-selltc-20160526 branch (refer to github #58 for deatils about the contents of that branch) within commit e25ea716184868626c604128ad403e22264cb749 (staging: unisys: Convert visornic to use visorbus channel interrupt code) (technically my tweak to that commit just prevents the problem from being re-introduced), but unfortunately, for-later has another unrelated problem that will prevent modprobe -r visornic
from working -- see github #70.
Tim, I haven't really looked at github yet, but is the patch tweaked in the staging patch or is it a separate patch before the convert visornic to use visorbus channel interrupts? I thought we wanted to push the fix to visornic before we made changes to for-later.
commit 6ee6e503653a603880a4c1bec306fabb90aa5229 (referenced in my May 24 comment above) is still a valid patch to fix staging-next.
When I built my local version of for-later (for-later-proposed-selltc-20160526), I found that I needed to tweak commit e25ea716184868626c604128ad403e22264cb749 (a patch that is specific to for-later) in order to prevent this bug from getting RE-introduced into the for-later patch deck.
This bug is actually fixed now within:
which is in the standard for-later patch deck.
I verified this patch is indeed intact in the latest version of the for-later branch.
I noticed this patch still isn't in Greg's staging-next tree, as you can see that visornic_probe() still has redundant calls to netif_napi_add()
there:
/* TODO: Setup Interrupt information */
/* Let's start our threads to get responses */
netif_napi_add(netdev, &devdata->napi, visornic_poll, 64);
setup_timer(&devdata->irq_poll_timer, poll_for_irq,
(unsigned long)devdata);
/* Note: This time has to start running before the while
* loop below because the napi routine is responsible for
* setting enab_dis_acked
*/
mod_timer(&devdata->irq_poll_timer, msecs_to_jiffies(2));
channel_offset = offsetof(struct spar_io_channel_protocol,
channel_header.features);
err = visorbus_read_channel(dev, channel_offset, &features, 8);
if (err) {
dev_err(&dev->device,
"%s failed to get features from chan (%d)\n",
__func__, err);
goto cleanup_napi_add;
}
features |= ULTRA_IO_CHANNEL_IS_POLLING;
features |= ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING;
err = visorbus_write_channel(dev, channel_offset, &features, 8);
if (err) {
dev_err(&dev->device,
"%s failed to set features in chan (%d)\n",
__func__, err);
goto cleanup_napi_add;
}
/* Let's start our threads to get responses */
netif_napi_add(netdev, &devdata->napi, visornic_poll, NAPI_WEIGHT);
@davidker, I suspect the reason we haven't submitted it is just because we've already fixed this in our for-later branch, and because it's NOT on Greg & company's radar?
From looking at it, I think the fix was for-later and conflicted when I tried to pull it into upstream-next (submit to staging-next). I'm wondering if we should just remove the extra napi_add or do something else in staging-next. If I run the rmmod test should that cause this to fail? I thought I was running that on a weekly basis for upstream-next. I'll double check.
Yeah, I noticed your comment in KanBoard-1299 that said you did NOT see this problem with staging-next. But I suspect that a redundant call to napi_add()
like that is capable of causing randoms symptoms, akin to a double-free or something.
Okay, I will try to move the patch over. Thanks for catching this.
Comited at 9c70ee32ffedc48705b08d97212683a5482628cb
See KanBoard-1299.
Since this is staging-next, where the most-recent patch of ours was applied by Greg 5/9 with commit 03759f8cd6e88517980b11c96f377e51d765d403 (staging: unisys: rename misleading var ii with frag), it has been broken for some time.