PiSCSI / piscsi

PiSCSI allows a Raspberry Pi to function as emulated SCSI devices (hard disk, CD-ROM, and others) for vintage SCSI-based computers and devices. This is a fork of the RaSCSI project by GIMONS.
https://piscsi.org
BSD 3-Clause "New" or "Revised" License
545 stars 82 forks source link

MODE SELECT ignores mode pages in develop branch, v23.11.01 works #1427

Closed uweseimet closed 7 months ago

uweseimet commented 10 months ago

This ticket refers to commit a23d642 (current develop branch).

When you send a sequence of mode pages with MODE SELECT(6), some pages are ignored. If one of the pages is a format device page for a sector size change (configured in the page data, not in the descriptor), and when this page is located somewhere at the end of the list, the change is ignored. The log entry about the sector size change does not appear. When you move this page up in the page list the log entry appears. Whether a page is evaluated or not seems to depend on its position in the list of mode pages.

With release v23.11.01 this is correct, i.e. for the same command parameters the sector size change request is logged (for both MODE SELECT(6) and MODE SELECT(10)), regardless of the size of the page list.

rdmark commented 9 months ago

Thanks for reporting, Uwe.

@kkaempf At a glance, this seems to be a specific scenario that we didn't consider when reviewing your ModeSelect PR series.

https://github.com/PiSCSI/piscsi/pull/1405 https://github.com/PiSCSI/piscsi/pull/1406

rdmark commented 8 months ago

@uweseimet Does the unit test added here adequately represent the test case that you describe in this issue? https://github.com/PiSCSI/piscsi/pull/1443

uweseimet commented 8 months ago

@rdmark No, it doesn't.. The test needs to send multiple pages (a list of pages) with a single MODE SELECT. Then, depending on where in the list the sector size is changed, the change does not have any effect. You can see this in the logfile: piscsi in the develop branch does not log that the sector size has been changed, v23.11.01 does. This shows that the code parsing the page list passed with MODE SELECT is broken in the develop branch.

Please note that I do not intend to spend more time on this and the other MODE SELECT bug. I thought I'd report what I found, so that it does not end up in a release, but I am not interested in defending why I think that a program that crashes has a bug.

rdmark commented 8 months ago

@uweseimet Can you please share a MODE SELECT(6) payload that you used to demonstrate the bug? This would help avoid making incorrect assumptions or mistakes based of a misunderstanding of SCSI behavior.

uweseimet commented 8 months ago

The payload (list of pages) was similar to the one below. It may even have been the same, but I don't know anymore.

02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
03:16:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00
uweseimet commented 8 months ago

@rdmark Out of curiosity I diffed scsi_command_util.cpp against the last release and found this:

-               length -= size;
+               length -= size + 1;
                offset += size;

This looks quite wrong to me. Why is the length now decremented faster than the offset is incremented? This means offset and length are out of sync, aren't they? IMHO this results in the loop being terminated too early, before all data have been parsed.

rdmark commented 8 months ago

@uweseimet if you look at the git blame for that change you find this commit: https://github.com/PiSCSI/piscsi/commit/ad5eae93e7f721817c95a01285babdb97ed940b8

The commit message lays out the justification for the logic change in detail. This is the relevant passage. Do you have a different interpretation of this data?

* Fix page length computation in ModeSelect

tl;dr

The 'skip to next ModeSelect page' computation was off-by-one, either not
taking the page code itself into account or missing the fact that the
page length is given as `n - 1`.

Fix:

Add 1 to the computed length.

Details:

OpenVMS Alpha sends a ModeSelect6 as follows
~~~
command:

ModeSelect6, CDB $151000001900

payload:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
00 00 00 08 00 00 00 00 00 00 02 00 01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

This translates to (accoring to [1], Section 8.3.3)

~~~
Mode Data Length 0
Medium Type      0
Device-specific  0
Block desc len   8
~~~

with the following offset / length computation _before_ the `while` loop
~~~
offset = 12
length = 13
~~~

The first payload section is
~~~
 4  5  6  7  8  9 10 11
00 00 00 00 00 00 02 00
~~~

translating to

~~~
Density Code     0
Number of blks   0
Reserved         0
Block length   0x200 512
~~~

Then follows a pagecode 1 as
~~~
12 13 14 15 16 17 18 19 20 21 22 23 24
01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

translating to
~~~~
Page code        1
Page length -1  10
Mode parameters 24 00 00 00 00 00 00 00 00 00 00
~~~

computing (inside the `while` loop, as `// Advance to the next page`)

~~~
size =  10 + 2 = 12
~~~

followed by new `offset` and `length` values

~~~
offset = 25
length = 1
~~~

So it stays in the `while` loop (and has a larger-than-buffer `offset`
value)
uweseimet commented 8 months ago

@rdmark IMO the current code is wrong, independent of the data. If you iterate over an array and remember the array index and the number of remaining bytes, is it correct to add/subtract different values? Wouldn't the sum of the index and the remaining byte count always have to be the same? If the number of remaining bytes is decremented faster than the index is incremented you are out of sync and terminate the loop too early. This would perfectly explain the bug I have observed. Please correct me if I am missing something.

https://github.com/uweseimet/scsi2pi/blob/main/cpp/test/scsi_hd_test.cpp has some unit tests with mode page lists, e.g. ModeSelect6_Multiple. (The page list size is not sufficient to completely cover this ticket, though.) In https://github.com/uweseimet/scsi2pi/blob/main/cpp/devices/disk.cpp in the ModeSelect() method both offset and remaining byte count are adjusted by the same value. I recommend to add similar unit tests to PiSCSI. Payload examples for such tests I have already provided, and https://github.com/uweseimet/scsi2pi/blob/main/cpp/test/scsi_hd_test.cpp provides more examples. Of course, if it turns out that something is wrong with my unit tests I would appreciate feedback.

Maybe I should give an abstract example on what I think is wrong. Assume a page list of 10 pages with a length of 2 bytes each. At the beginning offset is 0, length is 20. After parsing a page the offset is incremented by 2 (size), the length decremented by 3 (size + 1). You get this sequence:

offset = 0, length = 20 offset = 2, length = 17 offset = 4 length = 14 offset = 6, length = 11 offset = 8, length = 8 offset = 10, length = 5 offset = 12, length = 2 offset = 14, length = -1

Parsing stops here because length is < 0. The pages at offsets 16 and 18 have not yet been parsed, i.e. they are ignored.

kkaempf commented 8 months ago

The payload (list of pages) was similar to the one below. It may even have been the same, but I don't know anymore.

02:01:00
...
02:01:00
03:16:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00

Where is this data coming from ?

I'd assume this would be "mode pages". Looking at the definition (Table 94 in Small Computer System Interface - 2 rev 10L.pdf - page 148 in document, 184 in pdf) it has

So the above should rather be

02:00:00
...
02:00:00
03:15:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00

Commit https://github.com/PiSCSI/piscsi/commit/ad5eae93e7f721817c95a01285babdb97ed940b8 was based on actual captured SCSI data with 2 pages.

I'm going to extend the testcase now to cover > 2 pages. It might well be that the offset / length computation still is wrong for > 2 pages.

uweseimet commented 8 months ago

The specification says: "The page length field specifies the length in bytes of the mode parameters that follow. If the initiator does not set this value to the value that is returned for the page by the MODE SENSE command, the target shall terminate the command with CHECK CONDITION status."

The length field for MODE SELECT is the same as the length field reported by MODE SENSE for the same page.

When the page data are 02:01:00 there is 1 byte following the page length field, i.e. the page length field must be 1. Please look at the mode page examples in the specification, e.g. table 96. In table 96 there are 6 bytes following the length field and the length field is 6. This is consistent with the mode page data I used in my tests. Correct me if I am wrong, but if you were right the page length in table 96 would have to be 5, which is not the case.

But this is not related to the potential offset/length issue I mentioned, because the affected length is not an individual page length but the total length of the payload data.

kkaempf commented 8 months ago

Please look at the mode page examples in the specification, e.g. table 96. In table 96 there are 6 bytes following the length field and the length field is 6. This is consistent with the mode page data I used in my tests. Correct me if I am wrong, but if you were right the page length in table 96 would have to be 5, which is not the case.

Looks like we have a conflict between an example in the spec (length byte == number of byte following) and the actual data captured from the DEC BIOS (length byte == number of bytes following minus 1) 😉

Do we need a DEC BIOS flag ? 🤔

Anyone with more captured data to look at ?

uweseimet commented 8 months ago

@kkaempf Before you try to add any flag I recommend to verify that the DEC BIOS really violates the specification. I doubt it, because in this case it would not be compatible with any SCSI drive that correctly implements the specification. I bet the vast majority of drives do it right, because otherwise they would not work with most of the available drivers/tools. It is more likely that you have misinterpreted something.

Do you have any real SCSI drive that is compatible with your DEC BIOS? In this case I recommend that you verify the behavior of this drive regarding MODE SELECT. If you are right, in order to be compatible with the DEC BIOS the drive would have to violate the specification. You can use your PISCSI board and the s2pexec tool of https://www.scsi2pi.net/, for instance, to run any SCSI command against this drive and to evaluate the results.

If you are convinced that your BIOS is a special case I would be interested if https://www.scsi2pi.net/, which has a more elaborate MODE page handling than PiSCSI, works with your board or not. Testing this does not take much time, you can simply install an SCSI2Pi binary package.

In any case, what the specification says should be the default implementation.

@rdmark All in all, it's up to you, of course, but as you can see from my comments I doubt once again that any kind of "quirks" mode is needed. Real SCSI drives also do not have this kind of mode and work with all kinds of platforms/drivers. Usually in such a case there is just a misunderstanding of the specification or what the drivers do.

uweseimet commented 8 months ago

@kkaempf I thought I'd run a test with your payload and the SCSI2Pi in_process_test (see https://github.com/uweseimet/scsi2pi/wiki/Advanced-Testing) myself, as a proof of concept. This is the result:

>./bin/in_process_test -s "-i 0 zip.hds"
s2pexec>-i 0 -c 151000001900 -d 000000080000000000000200010a2400000000000000000000
[17:52:57.144] [warning] Device reported CHECK CONDITION (status code $02)
Error: ILLEGAL REQUEST (Sense Key $05), INVALID FIELD IN PARAMETER LIST (ASC $26), ASCQ $00
s2pexec>

The error is expected because your payload appears to contain a vendor-specific mode page 0, which is not supported by the default configuration. As soon as I add a user-defined mode page 0 (see https://www.scsi2pi.net/en/properties.html) your payload is accepted and the error is gone:

>./bin/in_process_test -s "-i 0 zip.hds"
s2pexec>-i 0 -c 151000001900 -d 000000080000000000000200010a2400000000000000000000
s2pexec>

This confirms that nothing is wrong with how PiSCSI v23.11.01 parses the mode pages (offset/length). But PiSCSI does not support mode page 0, and this appears to cause the problem with your payload.

Note that for a valid solution PiSCSI has to support mode page 0 for both MODE SENSE and MODE SELECT. User-configurable (on device level) mode page data are IMO the only valid solution, because each device that supports mode page 0 can have different mode page data. If you just implement support for mode page 0 as it is needed by your setup, as soon as the there is another user with a setup that needs a different mode page 0, you have a conflict: Without a user-definable mode page, why should PiSCSI support mode page 0 just like you need it, and not just like the other user(s) need it? There is also a conflict if two drivers for two different devices each expect vendor-specific data. If there is just a single (default) implementation for mode page 0, one of the drivers will fail. This can even happen with the other mode pages, because they use hard-coded data.

kkaempf commented 8 months ago

@uweseimet, I stand corrected on the offset/length calculation in scsi_command_util.cpp. See https://github.com/PiSCSI/piscsi/pull/1447 for a fix.

However, I don't get your "page 0" arguments about the Alpha VMS payload. 000000080000000000000200010a2400000000000000000000 splits up into 00000008 0000000000000200 010a2400000000000000000000 as explained in https://github.com/PiSCSI/piscsi/issues/1427#issuecomment-2021621269

Looking at the 010a2400... - it's page 0x01 request with 0x0a bytes following. However, 0x0b bytes actually follow. The "trailing" byte 0 could be interpreted as a "page 0" but it's missing the length byte. 🤷🏻‍♂️

uweseimet commented 8 months ago

@kkaempf Page 0 does not have/require a length byte. Please check the SCSI specification. This page does not have any well-defined format. This is why it must be the last page of a page list, because otherwise you would not be able to determine the start of the next page.

rdmark commented 7 months ago

To recap, the reason we wanted to implement the DEC vendor specific ModeSelect page 0 was because the DEC machine cannot boot from a piscsi volume without it:

https://github.com/PiSCSI/piscsi/issues/1402

The question is if there’s another way to handle the vendor page, I e ignoring or retuning some kind of stub value that makes the DEC boot without introducing SCSI spec breaking behavior.

rdmark commented 7 months ago

Whole changeset reverted with https://github.com/PiSCSI/piscsi/pull/1451