visorhba: forward_taskmgmt_command(): ptr --> u64 cast is bad[upstream]

selltc commented 8 years ago

301          /* specify the event that has to be triggered when this */
302          /* cmd is complete */
303          cmdrsp->scsitaskmgmt.notify_handle = (u64)&notifyevent;
304          cmdrsp->scsitaskmgmt.notifyresult_handle = (u64)&notifyresult;

Source: Dan Carpenter dan.carpenter@oracle.com Mon 5/2/2016 5:53 AM

This casting an int pointer to u64 pointer seems bad.

See KanBoard-994

selltc commented 8 years ago

Picking at this nit has led me to believe that there is a bug here that prevents task mgmt commands in visorhba from EVER working. I am working on code to use idr_alloc() / idr_remove() and friends (so we can use simple ints instead of pointers to make the round-trip between guest and iovm), but would like to discuss the bug here. The bug is irrelevant to the pointer / u64 casting issue.

When issuing a task mgmt command (CMD_SCSITASKMGMT_TYPE) to the iovm, forward_taskmgmt_command() includes pointers within the command area that will be used to wake up the issuing process and provide the result when the command completes:

cmdrsp->scsitaskmgmt.notify_handle = (u64)&notifyevent;
cmdrsp->scsitaskmgmt.notifyresult_handle = (u64)&notifyresult;

notify_handle is a pointer to a wait_queue_head_t variable, and notifyresult is a pointer to an int. Both of these are just local stack variables in the issuing process.

The way it's supposed to happen is that when the iovm completes the command, in our completion handling we get copies of those pointers back from the iovm, where we stash the result of the command at *notifyresult (which should not be 0xffff, because that is the initial value that the caller is looking to see a change in), and wake up the wait queue at *notify_handle. There are several places we do that dance, but we always do it WRONG, like:

cmdrsp->scsitaskmgmt.notifyresult_handle = TASK_MGMT_FAILED;
wake_up_all((wait_queue_head_t *)cmdrsp->scsitaskmgmt.notify_handle);

The wake_up_all() part is correct (albeit with the help of the sloppy pointer casting, but that's irrelevant to the bug), but the assignment of notifyresult_handle is WRONG, and SHOULD read:

*(int *)(cmdrsp->scsitaskmgmt.notifyresult_handle) = TASK_MGMT_FAILED;

Without this change, the caller is NEVER going to notice a change in his local value of notifyresult when he does the:

if (!wait_event_timeout(notifyevent, notifyresult != 0xffff, msecs_to_jiffies(45000)))

and hence will be timing out EVERY taskmgmt command.

This bug also exists in the CMD_VDISKMGMT_TYPE handling, although it looks to me like we never use this code, so I think we can just delete it all.

@davidker, I'd like your feedback on this. Thanks!

selltc commented 8 years ago

I have successfully tested my patches (commit fa816ac0835361f5590e22d7069a818270f40807 and commit 3f166659c75c1271f9cce3acdc1b704ffdae204c) with the help of commit fca8f084c1b29f8f143e279488de0dc754488ee1. Now I just have to prepare decent log comments, and figure how I want to organize the 3 patches. The 1st pach is lexically required for the 2nd patch to apply properly, and the 3rd patch provides the debug messages necessary to unit-test.

selltc commented 8 years ago

I have squashed the patches down to the 2 I want in commit 3fd28bbf4d9dd50b4211ea03c428da1cebd7026e and commit ffcfb2ac746ffb240e660140f3c03457c1109157, in addition to cleaning up the comments.

selltc commented 8 years ago

@davidker

Proposed cover letter for submitting commit 3fd28bbf4d9dd50b4211ea03c428da1cebd7026e and commit ffcfb2ac746ffb240e660140f3c03457c1109157

[PATCH 0/2] staging: unisys: visorhba: fix scsi task mgmt completion handling

On 5/2 Dan Carpenter pointed out some sloppy u64 <--> pointer casting we were
doing in visorhba, but this turned out to be the tip of an iceberg.  Our
handling of scsi task mgmt completions turned out to be buggy.  The first of
the patches removes some buggy code that is fortunately no longer used, and
the second patch corrects the remaining code, and cleans it up to use
idr_alloc() / idr_find() so that we can cleanly pass handles back-and-forth
from/to the IO partition instead of raw pointers.

These patches are checkpatch clean and tested successfully, using the test procedure described in commit ffcfb2ac746ffb240e660140f3c03457c1109157.

selltc commented 8 years ago

Oops... my 2nd patch above didn't have the correct signed-off-by. So please use patches:

commit a0f90fce3cea5fd616a059060da87aea9bfb1d88 and,
commit c204149c7bc2212c51b4fed09dff1d6fa6f8401a

instead. (Sorry about that.)

selltc commented 8 years ago

Your upstream-next commits:

commit 28335f134f332267f287397141201dcc0cd2df53
commit cd593a42b32ff823a855c45de18b6cc71a60aec3

appear to be the freshest versions of these patches, which can go as soon as Greg does his next update to staging-next.

The cover letter in issuecomment-217237338 above still works.

selltc commented 8 years ago

Greg added these patches to staging-testing, then to staging-next a few hours later:

commit 9c4dfdaa256715b08ef907c3533288f606fe78a0 - staging: unisys: visorhba: delete processing of vdiskmgmt commands
commit a7d656063e55f8227a54fad99270b26472719ff1 - staging: unisys: visorhba: correct scsi task mgmt completion handling

davidker / unisys

visorhba: forward_taskmgmt_command(): ptr --> u64 cast is bad[upstream] #36