ZuluSCSI / ZuluSCSI-firmware

Firmware for the ZuluSCSI advanced SCSI emulator
https://zuluscsi.com
Other
182 stars 20 forks source link

Random watchdog timeouts #441

Open ltning opened 2 months ago

ltning commented 2 months ago

Hi, I have a zuluscsi 1.1 connected to an Adaptec AIC7770 (AHA-2740 EISA). Using both 50pin and 68pin cables (with adapter in the latter case), I'm seeing sporadic SCSI bus hangs during I/O. OS is OS/2 Warp, with what I can determine as the latest driver for this card. It's known to be a solid and stable workhorse for OS/2 servers.

This is seen with both very old firmware and current firmware.

Snippet from debug log below; full log file attached (it contains data from several power cycles).

I have no idea where to look; tried two controllers (same), both controllers are NOS and shipped with the last firmware Adaptec produced.

Any ideas? Can this be a firmware issue? What should I try?

PS: With the 68pin cable, which is equipped with a terminator at the end of the cable, I've only tested with termination enabled on the ZuluSCSI. It has occurred to me this might not be optimal - then again, having seen this issue also with a regular (and short!) 50-pin cable, I think it might not be relevant?

Log excerpt:

[985799ms] DBG ---- Total IN: 4096 OUT: 0 CHECKSUM: 57984
[985799ms] DBG ---- STATUS: 0 GOOD
[985803ms] DBG ---- MESSAGE_IN
[985803ms] DBG ------ IN: 0x00 
[985804ms] DBG -- BUS_FREE
[985814ms] DBG -- BUS_BUSY
[985814ms] DBG ---- SELECTION: 1
[985815ms] DBG ---- MESSAGE_OUT
[985815ms] DBG ------ OUT: 0xC0 
[985816ms] DBG ---- COMMAND: Read10
[985816ms] DBG ------ OUT: 0x28 0x00 0x00 0x02 0x2B 0x40 
[985817ms] DBG ------ OUT: 0x00 0x00 0x08 0x00 
[985817ms] DBG ------ Read 8x512 starting at 142144
[985818ms] DBG ---- DATA_IN, syncOffset 15 syncPeriod 25
[1000815ms] WATCHDOG TIMEOUT at PC 0x0801AD44 LR 0x08018F5B attempting bus reset
[1000815ms] DBG Bus reset during sync transfer, total 512 bytes, remaining ACK count 1
[1000817ms] DBG Bus reset during sync transfer, total 512 bytes, remaining ACK count 1
[1000822ms] DBG ---- Total IN: 4096 OUT: 0 CHECKSUM: 53686
[1000823ms] DBG -- BUS_FREE
[1000824ms] DBG ---- SELECTION: 1
[1000825ms] DBG ---- MESSAGE_OUT
[1000825ms] DBG ------ OUT: 0xC0 
[1000826ms] DBG ---- COMMAND: Read10
[1000826ms] DBG ------ OUT: 0x28 0x00 0x00 0x07 0xE9 0xF0 
[1000827ms] DBG ------ OUT: 0x00 0x00 0x08 0x00 
[1000827ms] DBG ------ Read 8x512 starting at 518640
[1000828ms] DBG ---- DATA_IN
[1000830ms] DBG ---- Total IN: 4096 OUT: 0 CHECKSUM: 55353
[1000830ms] DBG ---- STATUS: 0 GOOD
[1000834ms] DBG ---- MESSAGE_IN
[1000834ms] DBG ------ IN: 0x00 
[1000835ms] DBG -- BUS_FREE
[1000835ms] DBG ---- SELECTION: 1
[1000835ms] DBG ---- MESSAGE_OUT
[1000836ms] DBG ------ OUT: 0xC0 
[1000836ms] DBG ---- COMMAND: Read10
[1000836ms] DBG ------ OUT: 0x28 0x00 0x00 0x02 0x2B 0x40 
[1000837ms] DBG ------ OUT: 0x00 0x00 0x08 0x00 
[1000837ms] DBG ------ Read 8x512 starting at 142144
[1000838ms] DBG ---- DATA_IN
[1000840ms] DBG ---- Total IN: 4096 OUT: 0 CHECKSUM: 53686
[1000840ms] DBG ---- STATUS: 0 GOOD
[1000844ms] DBG ---- MESSAGE_IN
[1000844ms] DBG ------ IN: 0x00 
[1000845ms] DBG -- BUS_FREE
[1000845ms] DBG ---- SELECTION: 1
[1000845ms] DBG ---- MESSAGE_OUT
[1000846ms] DBG ------ OUT: 0xC0 
[1000847ms] DBG ---- COMMAND: Write10
[1000847ms] DBG ------ OUT: 0x2A 0x00 0x00 0x01 0x07 0x69 
[1000848ms] DBG ------ OUT: 0x00 0x00 0x01 0x00 
[1000848ms] DBG ------ Write 1x512 starting at 67433
[1000849ms] DBG ---- DATA_OUT
[1015846ms] WATCHDOG TIMEOUT at PC 0x0801A68A LR 0x00000008 attempting bus reset
[1015850ms] DBG ---- Total IN: 0 OUT: 512 CHECKSUM: 0

zululog.txt

ltning commented 2 months ago

My zuluscsi.ini:

[SCSI]
Debug = 1
DebugIgnoreBusyFree = 0
#EnableUnitAttention = 1
EnableSCSI2 = 1

[SCSI1]
SectorsPerTrack = 63
HeadsPerCylinder = 16

[SCSI2]
SectorsPerTrack = 63
HeadsPerCylinder = 16

[SCSI3]
SectorsPerTrack = 63
HeadsPerCylinder = 16

[SCSI4]
SectorsPerTrack = 63
HeadsPerCylinder = 16
aperezbios commented 2 months ago

@ltning thanks for the report. The first thing I recommend you try is to limit throughput to 5MB/sec synchronous, to see if this issue only manifests itself in 10MB/sec synchronous SCSI mode. Please try this by setting MaxSyncSpeed=5 in the global section of your zuluscsi.ini file. If this still results in watchdog timeouts, you can also try setting MaxSyncSpeed=0, which will disable synchronous SCSI entirely. If this works, it will help in narrowing down where in the code path things may be going off the rail.

One thing we need to establish is whether or not there are any other SCSI devices on the bus. Are there, or is the ZuluSCSI the only device on the bus? Also, if you haven't tested with a single SCSI device on the bus, please do so.

When referring to firmware, please specify exact version numbers. We have no way of knowing exactly which firmware versions you've tried with, and often, customers tell us they're running "the latest" without specifying what that version is, and it ends up they're not actually running the latest release. By providing a non-truncated log file, you answer that question without us needing to explicitly ask for it.

It's also critical that you're using an SD card that's at least class 10. Can you please provide a full log file?

Thanks!

ltning commented 2 months ago

Hi,

the full log was included. See the end of my initial message. Firmware is 2024.07.16.

The ZuluSCSI is the only device, and it has five disk images plus a CD image.

I've capped the sync speed to 8MB/s for all the SCSI targets in the controller BIOS; I have not yet seen the issue re-surface. I don't have a fully reliable reproduction scenario, but booting up the machine (it runs OS/2 with Novell NetWare) often causes this to happen during NetWare startup since it exercises the SCSI bus somewhat.

I will of course provide another log file if and when it happens again.

Are there any other useful steps I can take?

/Eirik

aperezbios commented 2 months ago

@ltning apologies for missing it. Thanks for the additional information. Can you confirm that you do not have MaxSyncSpeed set in your ZuluSCSI.ini file?

ltning commented 2 months ago

Correct. At the moment, the controller is configured to negotiate sync at max 8MB/s (as opposed to 10MB/s when the timeouts occurred), and it has not yet encountered the problem. Based on my significant but very dusty (25 years since I dealt with it on a daily basis) experience with SCSI, I suspect it does not matter whether it's the controller or the device that enforces this?

ltning commented 2 months ago

Also, the SD card is a SanDisk High Endurance card. Should be okay for the task at hand..? IMG_2722