davidker / unisys

Master repository for new changes to drivers/staging/unisys and drivers/visorbus
Other
2 stars 1 forks source link

`modprobe -r visornic` doesn't work in staging-next; process just hangs in a hard processor-loop #64

Closed selltc closed 7 years ago

selltc commented 8 years ago

See KanBoard-1299.

Since this is staging-next, where the most-recent patch of ours was applied by Greg 5/9 with commit 03759f8cd6e88517980b11c96f377e51d765d403 (staging: unisys: rename misleading var ii with frag), it has been broken for some time.

selltc commented 8 years ago

staging-next-based fix for this is commit 6ee6e503653a603880a4c1bec306fabb90aa5229.

This problem also exists in the for-later branch, but I'm not going to worry about that right now. It is the netif_napi_add() that precedes the starting of the polling loop or enabling of interrupts that we need to keep; the other one is extraneous and must be deleted.

davidker commented 8 years ago

Okay, I think this is serious enough that it should probably be in the list that I'm sending up. Though I'm not going to hold the current dry-run for it since I'm about done with that. When we find changes I will post it with the rest tonight/tomorrow.

selltc commented 8 years ago

Thanks David. The main seriousness of this in my mind is really the possibility that there are hidden side-effects during normal operation caused by the double-netif_napi_add().

selltc commented 8 years ago

I have this fix integrated into my for-later-proposed-selltc-20160526 branch (refer to github #58 for deatils about the contents of that branch) within commit e25ea716184868626c604128ad403e22264cb749 (staging: unisys: Convert visornic to use visorbus channel interrupt code) (technically my tweak to that commit just prevents the problem from being re-introduced), but unfortunately, for-later has another unrelated problem that will prevent modprobe -r visornic from working -- see github #70.

davidker commented 8 years ago

Tim, I haven't really looked at github yet, but is the patch tweaked in the staging patch or is it a separate patch before the convert visornic to use visorbus channel interrupts? I thought we wanted to push the fix to visornic before we made changes to for-later.

selltc commented 8 years ago

commit 6ee6e503653a603880a4c1bec306fabb90aa5229 (referenced in my May 24 comment above) is still a valid patch to fix staging-next.

When I built my local version of for-later (for-later-proposed-selltc-20160526), I found that I needed to tweak commit e25ea716184868626c604128ad403e22264cb749 (a patch that is specific to for-later) in order to prevent this bug from getting RE-introduced into the for-later patch deck.

selltc commented 8 years ago

This bug is actually fixed now within:

which is in the standard for-later patch deck.

selltc commented 8 years ago

I verified this patch is indeed intact in the latest version of the for-later branch.

selltc commented 7 years ago

I noticed this patch still isn't in Greg's staging-next tree, as you can see that visornic_probe() still has redundant calls to netif_napi_add() there:

    /* TODO: Setup Interrupt information */
    /* Let's start our threads to get responses */
    netif_napi_add(netdev, &devdata->napi, visornic_poll, 64);

    setup_timer(&devdata->irq_poll_timer, poll_for_irq,
            (unsigned long)devdata);
    /* Note: This time has to start running before the while
     * loop below because the napi routine is responsible for
     * setting enab_dis_acked
     */
    mod_timer(&devdata->irq_poll_timer, msecs_to_jiffies(2));

    channel_offset = offsetof(struct spar_io_channel_protocol,
                  channel_header.features);
    err = visorbus_read_channel(dev, channel_offset, &features, 8);
    if (err) {
        dev_err(&dev->device,
            "%s failed to get features from chan (%d)\n",
            __func__, err);
        goto cleanup_napi_add;
    }

    features |= ULTRA_IO_CHANNEL_IS_POLLING;
    features |= ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING;
    err = visorbus_write_channel(dev, channel_offset, &features, 8);
    if (err) {
        dev_err(&dev->device,
            "%s failed to set features in chan (%d)\n",
            __func__, err);
        goto cleanup_napi_add;
    }

    /* Let's start our threads to get responses */
    netif_napi_add(netdev, &devdata->napi, visornic_poll, NAPI_WEIGHT);

@davidker, I suspect the reason we haven't submitted it is just because we've already fixed this in our for-later branch, and because it's NOT on Greg & company's radar?

davidker commented 7 years ago

From looking at it, I think the fix was for-later and conflicted when I tried to pull it into upstream-next (submit to staging-next). I'm wondering if we should just remove the extra napi_add or do something else in staging-next. If I run the rmmod test should that cause this to fail? I thought I was running that on a weekly basis for upstream-next. I'll double check.

selltc commented 7 years ago

Yeah, I noticed your comment in KanBoard-1299 that said you did NOT see this problem with staging-next. But I suspect that a redundant call to napi_add() like that is capable of causing randoms symptoms, akin to a double-free or something.

davidker commented 7 years ago

Okay, I will try to move the patch over. Thanks for catching this.

davidker commented 7 years ago

Comited at 9c70ee32ffedc48705b08d97212683a5482628cb