Open holstvoogd opened 2 years ago
Interesting, I think we had a similar report once shared on discord. They had a specific computer which acted like you describe. It might be the M1 USB stack is more spec compliant, and our current USB implementation is taking some liberties, which work on most hosts.
At the time this issue was thought to be a culprit: https://github.com/greatscottgadgets/luna/pull/139. I updated the bootloader source, and pulled in those changes: https://github.com/butterstick-fpga/butterstick-bootloader/commit/d4edf55020940f7669bbae403247294fb5ea8392
However that didn't seem to fix the underlying issue in that case. Not sure if this is another case of that bug, or something new.
One workaround is if you have any FT232 or equiv JTAG adapters, you can use ecpprog or openFPGALoader to program the ECP5 directly.
My rpi had the same issue i think. Although it wasn't libsub that errored, just a internal timeout in dfu-util. I'll see if I can try the upload from an older machine too.
I suppose the version of the bootloader installed wouldn't include those changes yet? I can try updating it, just to check if it helps in my case. I guess I could built the bootloader and put it in alt 0 to test it?
Otherwise, I guess I'll go find a jtag adapter that is M1 compatible ;-)
Yeah, the bootloader that's included with the current batch of buttersticks is v1.1 and doesn't include that change. I've updated the bootloader code, and packaged up a new release: https://github.com/butterstick-fpga/butterstick-bootloader/releases/tag/v1.2
Yes, you can install the bootloader into alt0 to check if it work better. It's a bit tricky because the bootloader expects you to be holding down the BTN0, otherwise it resets (in order to run the user application). Basically if you manage to load the new bootloader, the board will be in a soft loop until you press BTN0 again.
If it's working reliably using the bootloader stored in alt0, if you hold down BTN0 for 5s, the LEDs go red and it unlocks the bootloader partition as alt3
.
FTDI based adapters should work correctly on M1.
Did I understanding it correctly that, if I managed to get the new bootloader in alt0, I can activate it by pressing btn0 after it powered on, wereas I activate the 'old' bootloader by holding btn0 while powering on?
If so, the new version has not made a difference for me. If not, will I brick the board (at least the DFU mode), if I attempt to update alt3 and it fails to complete?
@holstvoogd Did you manage to get this working? Or find another fix?
@gregdavill I bought a jtag cable, that works reliably for programming. And after fixing the litex-boards config and adding the serial usb support you use in the litex example, I was able to get a SoC running. USB serial seems to be working fine at that point.
FYI: v1.3 has the same issue.
Oh, and JTAG is rather unreliable too, I blamed the dupont connectors for being crappy, but could it be related to the usb issues too?
It seems to get random idcodes, the last one changes all the time.
bitstream has 0x81113043 hardware requires 0x00201803
dupont cables can be a bit variable in quality. Lowering TCK speed might help a bit. I'm typically around 2-6MHz with the FTDI based probes.
There was an issue brought up in LUNA about the control endpoint no responding to PING messages in High-speed mode correctly. Which resulted in some hosts dropping the connection. I'll look into implementing the fix with the SoC glue layer we use here.
I spent this evening rebuilding the bootloader with the PING patch from LUNA (https://github.com/greatscottgadgets/luna/pull/172). Unfortunately, the DFU transfer still timeout like before. I also tried to increase the packet timeout from 1ms to 100ms (to give flash write more headroom) but had no luck.
I'm not familiar with the USB topology of DFU bootloaders. So, not sure if I'm missing some firmware logic to make this work.
@luigifcruz Do you have any tools to record the USB protocol? Even a sw based capture with usbmon would be useful as I've not been able to replicate it here. Knowing the packet and timing around the fault would help with knowing where the bug is.
@gregdavill Unfortunately I don't, but I will try something software based. Since I'm on macOS, I'll try USB debugging via Wireshark. I'm also considering booting Asahi Linux on M1 and checking usbmon
. I'll report back ASAP.
Over on the 1BitSquared ButterStick discord channel (https://discord.gg/c4Dnr3b) @zyp spent some time debugging this with their TotalPhase Beagle USB analyser. Firstly we both managed to replicate the issue by setting up a Raspberry Pi systems.
This is what a good transaction looks like, note of importance is the NAK
on the first OUT
packet in the dfu's download transaction. The time between SETUP
->OUT
is 4us.
When the download failed and stalled, the resulting transaction appears like so. Note this time the bootloader has ACK
'd the first OUT
packet that happened 42us after the SETUP
stage. It then NAK
's everything.
After examining the logic in luna.eptri's OUT handler I noticed that it didn't have any protection against the endpoint being enabled mid-way through a packet, our stack would still ACK
the packet as if it had received the entire thing.
I've now put in a mechanism to register the enable/prime signals for the endpoint at the start of a new packet. Which appears to fix the timeout issue on both our systems.
Can you try loading the image from here? https://github.com/butterstick-fpga/butterstick-bootloader/actions/runs/3033842078
You can load this into DFU alt-0 -a 0
, then when the user application is running on the butterstick press BTN0 to enter the bootloader.
You can verify it's the updated version by the product ID reported: Here "(dfu 7058dc2)"
[86910.739618] usb 1-1: new high-speed USB device number 11 using xhci_hcd
[86910.890370] usb 1-1: New USB device found, idVendor=1209, idProduct=5af1, bcdDevice= 1.00
[86910.890387] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[86910.890394] usb 1-1: Product: butterstick (dfu 7058dc2)
[86910.890399] usb 1-1: Manufacturer: Good Stuff Department
[86910.890403] usb 1-1: SerialNumber: e460-5107-a33a-5521
This is the bootloader running in the "Application slot" on the Butterstick, it's not replaced the bootloader at this stage. You can test that it's working correctly, by repetitively loading an image into alt 1 or 2.
You can then try to update the bootloader by holding down BTN0 for 5s, the LEDs will turn red, and you should now be able to upload to alt 3. This unlocks the FLASH and lets you overwrite the bootloader. If this fails, you'll need a JTAG adapter and connect to the JTAG pins to reload a working bootloader onto the board.
Awesome, @gregdavill! Thanks for the update. I'll try to flash the new bootloader to the board tonight. I have a Tigard so no problem if it gets bricked.
Sure. I'll wait until you've tested it. Then I'll merge it into a v1.4 release.
The changes are quite simple, it was just knowing where to look https://github.com/butterstick-fpga/butterstick-bootloader/compare/fix/out-enable-bug
I just tested the experimental bootloader binary, @gregdavill. I flashed it using Tigard and verified the version with lsusb
.
Bus 002 Device 008: ID 1209:5af1 1209 butterstick (dfu 7058dc2) Serial: e460-5107-a338-4d21
Then, I tested flashing a sample project in DFU mode using the dfu-util
command below.
dfu-util -v --alt 0 --download build/gsd_butterstick/gateware/gsd_butterstick.bit --reset
Perhaps a little bit slower than the previous version, but the flashing itself works. But when dfu-utils
tries to reset the board, it throws a timeout for some reason. The command without the --reset
works just fine but the board isn't reset. I copied and pasted the output below.
def-util 0.11
libusb version 1.0.26 (11724)
dfu-util: Warning: Invalid DFU suffix signature
dfu-util: A valid DFU suffix will be required in a future dfu-util release
Opening DFU capable USB device...
Device ID 1209:5af1
Device DFU version 0101
DFU attributes: (0x0d) bitCanDnload bitManifestationTolerant bitWillDetach
Detach timeout 50 ms
Claiming USB DFU Interface...
Setting Alternate Interface #0 ...
Determining device status...
DFU state(2) = dfuIDLE, status(0) = No error condition is present
DFU mode device DFU version 0101
Device returned transfer size 512
Copying data from PC to DFU device
Download [=========================] 100% 2029720 bytes
Download done.
Sent a total of 2029720 bytes
DFU state(7) = dfuMANIFEST, status(0) = No error condition is present
DFU state(2) = dfuIDLE, status(0) = No error condition is present
Done!
Resetting USB to switch back to Run-Time mode
dfu-util: error resetting after download: LIBUSB_ERROR_TIMEOUT
Sidenote: I tried to debug the reset error on a VM. It worked the first time, but the second time it got stuck just like macOS. The funny thing was that instead of throwing a timeout message, the whole VM crashed. So this makes things difficult to debug. 😅
Hmm, that's a bit strange. But it might be a combination of 2 settings, clashing.
First, we have bitWillDetach
set. Per the DFU SPec:
Device will perform a bus detach-attach sequence when it receives a DFU_DETACH request.
The host must not issue a USB Reset. (bitWillDetach)
0 = no
1 = yes
And wDetachTimeOut
being set to 50ms.
Time, in milliseconds, that the device will wait after receipt of the DFU_DETACH request. If this time
elapses without a USB reset, then the device will terminate the Reconfiguration phase and revert
back to normal operation. This represents the maximum time that the device can wait (depending on its
timers, etc.). The host may specify a shorter timeout in the DFU_DETACH request.
On a Detach request we actually wait 100ms, then cut the USB link before booting into the user app.
So a well behaved host, likely sends a DFU_DETACH
command. Waits 50ms. We've not re-attached like we've indicated with bitWillDetach
so dfu-util tried to perform a bus reset, and that times out because we're basically in the process of disconnecting from the host.
I'll adjust these settings, we should be able to fix that so that dfu-util exits cleanly. :)
Actually looking a bit further into it, using the -R
option in dfu-util always results in it attempting to reset the USB device, and ends up with a return code of -4. Because the device is not present.
If you do a --download
then in a second command run --detach
you don't get the same behaviour. :doh:
That's a different issue though.
Updated bootloader published as v1.4 should fix the timeout issues, and also fix the errors when using, it should now return with a clean return code.
dfu-util -a 0 -D <file> -R
https://github.com/butterstick-fpga/butterstick-bootloader/releases/tag/v1.4
Awesome @gregdavill! I'm a bit busy this week but I will try to test it in the next week. Thanks!
Longest week ever! Just tested the new bootloader, and it works very well! The timeout error disappeared.
Thanks for testing @luigifcruz. @holstvoogd Do you have some spare cycles to try and see if this fixes the issue?
@gregdavill I'm afraid I've lost track of my butter stick at the moment 😅 When I find it, I'll give it a try!
Finally picked up my butterstick again & managed to get this working! Works great with v1.4 👍
Hi,
I just received my Butterstick 1.0a yesterday & have been having trouble programming it. Basically the DFU download fails at a random percentage. So far I've only been succesful once or twice in putting a new bitstream on the device.
I've tried on different machines (albeit all ARM machines, m1 mac & a RPI) & with multiple different usb cables.
I've build a basic SoC with Litex, based on the litex-example and the litex-boards project, and use dfu-util to download the bitstream to the board. The debug output for a random build:
Truncated.. Just repeats the next bit until it fails
Based on that kernel status -536854447, it seems the USB transaction is not acked or something.
Any ideas on how to prevent/fix/debug/work around this?