felis / USB_Host_Shield_2.0

Revision 2.0 of USB Host Library for Arduino.
https://chome.nerpa.tech
1.79k stars 781 forks source link

INVALID_CSW in masstorage.cpp #462

Open greiman opened 5 years ago

greiman commented 5 years ago

BulkOnly::Transaction() fails with INVALID_CSW for Arduino boards that support SPI.transfer(buffer, size).

The problem is that this call in BulkOnly::Transaction() :

while((usberr = pUsb->outTransfer(bAddress, epInfo[epDataOutIndex].epAddr, sizeof (CommandBlockWrapper), (uint8_t*)pcbw)) == hrBUSY) delay(1);

will cause pcbw to be corrupted.

USB::outTransfer() calls USB::OutTransfer() which uses bytesWr() to send the pcbw structure.

bytesWr() calls USB_SPI.transfer(data_p, nbytes); to send the pcbw structure. Unfortunately transfer() writes over the pcbw structure with receive data.

This causes a failure of this test in BulkOnly::Transaction() if(IsValidCSW(&csw, pcbw)) {

I modified this section of bytesWr() and no longer have the problem.

#elif defined(SPI_HAS_TRANSACTION) && !defined(ESP8266) && !defined(ESP32)
        USB_SPI.transfer(reg | 0x02);
//      USB_SPI.transfer(data_p, nbytes);
//      data_p += nbytes;        
        while(nbytes) {
                USB_SPI.transfer(*data_p);
                nbytes--;
                data_p++; // advance data pointer
        } 

This is a poor fix since it slows the SPI transfer for all bytesWr() calls.

This problem may be related to issue 410

gdsports commented 5 years ago

This may be related to #403 and PR #470.

xxxajk commented 5 years ago

This is actually a problem because UHS2 assumes there is not a lot of RAM to play with. The actual correct fix would be to make a copy, and restore the copy if failure, or detect to see if anything was returned, and if so, just return the short packet. That is what the test for invalid CBW is supposed to do.

hrBUSY should mean that the device was busy, and nothing got transferred. The problem could be that the device has a quirk (bug), that will raise this condition. That's my guess. The fix for the quirk should probably be to recheck the CBW, or the transfer length instead (easier). If either has been changed, then bail out. The lower layers should be able to cope with the short read, and error out.

Does this happen on other devices too? Please dump the information from the device using USB_desc example and post it here. That way I might be able to find out more information about it, or get whatever it is you are plugging in to test/diagnose here myself.

gdsports commented 5 years ago

Yes, this problem occurs on my USB printer driver because the printer is slow so it NAKs a lot. The data buffer is corrupted because SPI.transfer uses it for output and input. The fix on #470 works by saving the first byte of the output buffer in case of a NAK. The max3421e NAK recovery code requires sending the first byte after a NAK so only the first byte needs to be saved. The #403 OP experienced the problem on Android.

xxxajk commented 5 years ago

NAK just means "not ready yet". Don't use a while(), instead try again later, returning the data. That data is probably status from the printer, although, I'm not sure, since I don't have your driver to even try on any printers that I have.

gdsports commented 5 years ago

You are welcome to give it a try. My friend is supposed to thrash on it but I have not gotten a response yet.

https://github.com/gdsports/USBPrinter_uhs2

xxxajk commented 5 years ago

I don't understand how the two are related... BulkOnly::Transaction isn't (and shouldn't) be called by the printer driver. If BulkOnly::Transaction is exploding, that means that the thumb drive has a quirk.

If the printer isn't emptying the allocated buffer yet, then you have to advance the pointer until your buffer is actually empty.

Here's what I need. 1: Get me the data dump from that thumb drive. 2: Try another thumb drive. 3: Please provide a sketch showing the problem.

Since he has this driver written, I can look deeper into it, and perhaps add it to UHS3, in which case your printing from file will become easier. It also has a much more robust core overall, but it does require at least a MEGA2560 or MEGA1280 to be able to operate, unless you are using ARM or MIPS based MCU, because thouse generally should have enough RAM anyway.

gdsports commented 5 years ago

Both drivers eventually call OutTransfer where bytesWr calls SPI.transfer which overwrites the buffer contents. Data corruption occurs on all NAKed out transfers so it not specific to mass storage or printers. See the PR I am currently using. It saves the first byte of the buffer before calling bytesWr and resends the saved byte if the transfer is NAKed.

https://github.com/felis/USB_Host_Shield_2.0/pull/470/commits/45c60d86affcced719879d015baa0bd4277526c0

gdsports commented 5 years ago

UHS3 has nearly identical code. When a NAK occurs, OutTransfer resends the first byte from the output buffer which has been overwritten by SPI.transfer. If data corruption on NAK occurs, the same fix as in #470 should work.

https://github.com/felis/UHS30/blob/6f78c4e486be9033b79e8b3eaf1972e8cf5a95cd/libraries/UHS_host/USB_HOST_SHIELD/USB_HOST_SHIELD_INLINE.h#L598

xxxajk commented 5 years ago

That means the device isn't following the rules, and is horribly broken. How these devices pass QA, we will never know.

Anyway, if it is busy, and can't do anything, it should be reporting busy using a NAK when the request is made, and nothing should be returned at all. Since this is not happening, this means that whatever you are talking to is replying ACK when the request is made, but when you go to read, goes "Oppsie! I meant NAK!", causing it to fall apart.

This is indeed a very strange quirk. Which device is actually doing this? printer? thumb drive? Please, post the USB_desc example output from the offending device. This will be very helpful to develop an actual fix, which can bypass any checking for devices that do not have this problem.

greiman commented 5 years ago

The problem occurs with your testusbhostFAT example. You can run testusbhostFAT to see the failure.

The problem has been present since Apr 5, 2015. I just included an older version of your library with my UsbFat library.

I am updating the UsbFat library to support exFAT and wanted to use the current version of your library so I posted this issue.

In usbhost.h This version of byteWr works:

@Lauszus
Lauszus Update master makefile
Latest commit 54aef65 on Apr 4, 2015

template< typename SPI_SS, typename INTR >
uint8_t* MAX3421e< SPI_SS, INTR >::bytesWr(uint8_t reg, uint8_t nbytes, uint8_t* data_p) {
        XMEM_ACQUIRE_SPI();
        SPI_SS::Clear();
#if USING_SPI4TEENSY3
        spi4teensy3::send(reg | 0x02);
        spi4teensy3::send(data_p, nbytes);
        data_p += nbytes;
#elif !defined(SPDR)
        SPI.transfer(reg | 0x02);
        while(nbytes) {
                SPI.transfer(*data_p);
                nbytes--;
                data_p++; // advance data pointer
        }
#else
        SPDR = (reg | 0x02); //set WR bit and send register number
        while(nbytes) {
                while(!(SPSR & (1 << SPIF))); //check if previous byte was sent
                SPDR = (*data_p); // send next data byte
                nbytes--;
                data_p++; // advance data pointer
        }
        while(!(SPSR & (1 << SPIF)));
#endif
        SPI_SS::Set();
        XMEM_RELEASE_SPI();
        return ( data_p);
}

This version of byteWr fails:

 Lauszus Only use pin 2 and 3 as INT and SS when using Intel Galileo 1
Latest commit b411506 on Apr 5, 2015 

/* returns a pointer to memory position after last written */
template< typename SPI_SS, typename INTR >
uint8_t* MAX3421e< SPI_SS, INTR >::bytesWr(uint8_t reg, uint8_t nbytes, uint8_t* data_p) {
        XMEM_ACQUIRE_SPI();
#if SPI_HAS_TRANSACTION
        SPI.beginTransaction(SPISettings(26000000, MSBFIRST, SPI_MODE0)); // The MAX3421E can handle up to 26MHz, use MSB First and SPI mode 0
#endif
        SPI_SS::Clear();

#if SPI_HAS_TRANSACTION
        SPI.transfer(reg | 0x02);
        SPI.transfer(data_p, nbytes);
        data_p += nbytes;
#elif USING_SPI4TEENSY3
        spi4teensy3::send(reg | 0x02);
        spi4teensy3::send(data_p, nbytes);
        data_p += nbytes;
#elif defined(__ARDUINO_X86__)
        SPI.transfer(reg | 0x02);
        SPI.transferBuffer(data_p, NULL, nbytes);
        data_p += nbytes;
#elif !defined(SPDR)
        SPI.transfer(reg | 0x02);
        while(nbytes) {
                SPI.transfer(*data_p);
                nbytes--;
                data_p++; // advance data pointer
        }
#else
        SPDR = (reg | 0x02); //set WR bit and send register number
        while(nbytes) {
                while(!(SPSR & (1 << SPIF))); //check if previous byte was sent
                SPDR = (*data_p); // send next data byte
                nbytes--;
                data_p++; // advance data pointer
        }
        while(!(SPSR & (1 << SPIF)));
#endif

        SPI_SS::Set();
#if SPI_HAS_TRANSACTION
        SPI.endTransaction();
#endif
        XMEM_RELEASE_SPI();
        return ( data_p);
}

Any call to bytesWr() will modify data_p modified with what's on the SPI MISO line.

Too bad Arduino implemented SPI::transfer(buf, count) instead of SPI:: transfer(txBuf, rxBuf, count) with nullptr allowed for buffer pointers. Often you don't want trash to be sent on MOSI for read or write to destroy the sent data buffer with whats on MISO.

gdsports commented 5 years ago

@greiman Please try PR #470 and see if that helps. OutTransfer saves the first byte of the output buffer before calling bytesWr so it can re-send the first byte in case of a NAK.

This re-send-first-byte-after-NAK requirement is documented although it sounds like a bug in the max3421e.

https://www.maximintegrated.com/en/app-notes/index.mvp/id/4000

I do not think this problem is device specific. Data corruption can occur when any device sends NAK to an OUT transfer.

The problem was introduced when support for SPI transactions was added.

xxxajk commented 5 years ago

It is device specific, because it doesn't happen here. Again, if I can buy the same device, and get it to fail, I can make a proper fix for it (in UHS3, I can specify quirks). Yes, it is a device specific bug. Many companies will buy into the same broken IP sometimes.

gdsports commented 5 years ago

Ok, do not bother with PR #470. Saving the first byte works for the NAK case but is not sufficient in general. Other drivers call outTransfer then use the contents of the output buffer so it must be preserved. The OP's fix preserves the entire buffer so it works for the NAK case as well as other cases.

UHS30 does not have this problem because it calls SPI.transfer 1 byte at a time. This is the same as the OP's proposed fix for UHS2.

uint8_t UHS_NI MAX3421E_HOST::bytesWr(uint8_t reg, uint8_t nbytes, uint8_t data_p) { ... while(nbytes) { SPI.transfer(data_p); //printf("%2.2x ", data_p); nbytes--; data_p++; // advance data pointer }

There is no performance reduction for the USB printer since it's performance is limited by the printer mech speed, not transfer speed.

greiman commented 5 years ago

Almost any USB key will fail with your testusbhostFAT example. Most USB keys are slow in initialization so there are retries.

Here is one of the many I use for debug.

xxxajk commented 5 years ago

Best fix is to not use the transfer method, ever. Who merged this?

Lauszus commented 5 years ago

@xxxajk it's not merged: https://github.com/felis/USB_Host_Shield_2.0/pull/470. Feel free to send a better PR then.

xxxajk commented 5 years ago

The right solution is one-at-a-time send only. The issue is that the Arduino folks just did transfer as a write and read and never had a plain write. I'll push a better fix tonight, unless there is a proper pull request that does this instead.

Lauszus commented 5 years ago

Yeah I know, it sucks :/ Okay, please send a PR.

xxxajk commented 5 years ago

Yeah, deal is that I have to be careful here with the code, since it is used in production. Not a big deal, though. Right at the moment I'm going in 2 directions, between production and helping out lulzbot people :-)

Lauszus commented 5 years ago

Btw any change you have a newer Bluetooth dongle laying around? Could you test this PR: https://github.com/felis/USB_Host_Shield_2.0/pull/471? Please run the BTHID example.

xxxajk commented 5 years ago

Someplace I should have a USB Bluetooth, I'll take a peek with that tonight too. Have to do it though after I get work done for my client.

Lauszus commented 5 years ago

@xxxajk thanks!

xxxajk commented 5 years ago

@Lauszus please see pm on hangouts...

gdsports commented 5 years ago

Is circuitsathome.com down for everyone or just me? I will have to fix some links if it is down for good. The website has lots of useful USB host info.

xxxajk commented 5 years ago

Honestly, it's dead for the most part.