Not enough bytes received during Logic Analyzer capture

KennethWilke commented 8 months ago

Hello! Thank you for working on and releasing Manta, very excited to use it more and see where it goes!

This evening I was trying to use the logic analyzer core for the first time, but I'm running into errors when I try to grab a capture from it.

>>> manta.logic_analyzer.capture()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/logic_analyzer/__init__.py", line 249, in capture
    raw_capture = self._sample_mem.read(addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/memory_core.py", line 264, in read
    datas = self._interface.read(bus_addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/uart/__init__.py", line 174, in read
    raise ValueError(
ValueError: Only got 133 out of 1792 bytes.

I'm not sure yet what's the issue is, my test device is a pico-ice board and I'm putting my work into this repo: https://github.com/KennethWilke/ti-linkeroo/tree/6b7c76d6c55076a3bd039c31b04d333e93ee1e76

My manta.yaml contains:

---
cores:
  io_pins:
    type: io

    inputs:
      sw2: 1

    outputs:
      r: 1
      g: 1
      b: 1

  logic_analyzer:
    type: logic_analyzer
    sample_depth: 4096
    trigger_location: 1

    probes:
      tip: 1
      ring: 1

    triggers:
      - tip FALLING
      - ring FALLING

uart:
  port: "/dev/ttyACM1"
  baudrate: 115_200
  clock_freq: 12_000_000

I took a little break then set everything back up again, and now the behavior is a little better but still running into similar read errors, I also attached my flipper zero to the tx pin of my pico so I could watch the data streaming back. What I see now is: I trigger the capture via python, then the pico begins to stream back packets like D0002, then when I cause one of my probe pins to fall the trigger hits and I get this error back on the Python side:

>>> manta.logic_analyzer.capture()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/logic_analyzer/__init__.py", line 249, in capture
    raw_capture = self._sample_mem.read(addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/memory_core.py", line 264, in read
    datas = self._interface.read(bus_addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/uart/__init__.py", line 174, in read
    raise ValueError(
ValueError: Only got 126 out of 1792 bytes.

If I try 2 more times to perform the capture, I'll get errors like: ValueError: Logic analyzer did not reset to IDLE state. then on the 3rd re-attempt it looks like the analyzer gets back into a good waiting condition until I cause the trigger to fire.

The IO core that's running in the same design seems to be working okay, if I perform a manta.io_pins.set_probe("b", 1) my blue LED lights up. Though, immediately after triggering the above ValueError the first io command I send is ignored, but after that it works. Hopefully that's some kind of clue too :sweat_smile:

I've had different quantity byte lengths returned for this error too, it's usually 126 but sometimes its 119 or 133 and on at least 1 occasion it was 98. Sometimes after I hit the capture error, when I would try a writing io command I would get what looks like 5 more D0002 packets back that I suspect was a buffered response from the attempted capture. I see these flow through my flipper zero to after I send the io command, so I think it's buffered on the device somewhere

I'll continue to tinker to see what I find, I keep wondering if my slow UART is going to be a problem here.

fischermoseley commented 7 months ago

Thanks for opening the issue! This is weird - normally when there's IO errors it tends to affect every core, but clearly that's not happening here.

A baudrate of 115200 shouldn't be an issue - but it's interesting that you're getting ~128 bytes back. What OS are you on? I've noticed that pySerial can sometimes have different sizes of input buffer and I think that's related to the OS's page size.

I've got an idea - try modifying your manta.yaml to:


---
cores:
  io_pins:
    type: io

    inputs:
      sw2: 1

    outputs:
      r: 1
      g: 1
      b: 1

  logic_analyzer:
    type: logic_analyzer
    sample_depth: 4096
    trigger_location: 1

    probes:
      tip: 1
      ring: 1

    triggers:
      - tip FALLING
      - ring FALLING

uart:
  port: "/dev/ttyACM1"
  baudrate: 115_200
  clock_freq: 12_000_000
  chunk_size: 8

This will cause the host to send fewer bytes out to the FPGA at a time, meaning that responses from the FPGA shouldn't overflow your OS's input buffer (if that's what's happening).

If this doesn't work I'll go investigating on hardware.

KennethWilke commented 7 months ago

Sure thing! Will give that a try after work today :+1:

For OS I'm using Linux 6.5.0 via a fairly recent Ubuntu 22.04.4 LTS kick.

KennethWilke commented 7 months ago

This is what I'm seeing with the modified manta.yaml :thinking:

>>> manta.logic_analyzer.capture()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/logic_analyzer/__init__.py", line 249, in capture
    raw_capture = self._sample_mem.read(addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/memory_core.py", line 264, in read
    datas = self._interface.read(bus_addrs)
  File "/home/kwilke/.local/lib/python3.10/site-packages/manta/uart/__init__.py", line 174, in read
    raise ValueError(
ValueError: Only got 35 out of 56 bytes.

fischermoseley commented 7 months ago

Interesting - this almost sounds like a bug in the way Manta assigns its internal address space when multiple cores are included in a single instance. Lemme do some investigation on hardware this weekend!

KennethWilke commented 7 months ago

I was tinkering with this a little bit more today, I started off by simplifying my manta.yaml and test setup. My new manta file is:

---
cores:
  logic_analyzer:
    type: logic_analyzer
    sample_depth: 4096
    trigger_location: 2048

    probes:
      sw2: 1

    triggers:
      - sw2 FALLING

uart:
  port: "/dev/ttyACM1"
  baudrate: 115_200
  clock_freq: 12_000_000
  chunk_size: 32

My trigger now is just setup for a button on the board so it's a little easier for me to test.

I also tweaked my test.py to test the behavior with different chunk sizes, in this case just looping and printing exception:

from manta import Manta
manta = Manta("manta.yaml")

while True:
    try:
        x = manta.logic_analyzer.capture()
        print("captured!");
    except Exception as e:
        print(e.args[0])

When I run this with a chunk_size of 8, 16 or 32 I get pretty stable behavior, for each trigger fired I'd get a consistent exception:

At 8: Only got 35 out of 56 bytes. At 16: Only got 35 out of 112 bytes. At 32: Only got 35 out of 224 bytes.

When I test it at 64, the behavior starts to change. It starts off waiting as before but when I trigger the condition the Python program is now looping through capture attempts. The first iteration seems to consistently be 0, with other iterations usually being 35 bytes but sometimes 42.

$ python test.py 
Only got 0 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 42 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.
Only got 35 out of 448 bytes.

When I see the "Only got" value shifting like this, it very often is by 7 byte difference in size :thinking:

With a chunk_size of 128, it continues to loop, the first response back is 35 bytes and then usually 63 but sometimes 70.

$ python test.py 
Only got 35 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 70 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 70 out of 896 bytes.
<reset device>
$ python test.py 
Only got 35 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 63 out of 896 bytes.
Only got 70 out of 896 bytes.
Only got 63 out of 896 bytes.

At 256 (default) I get the originally described behavior, it looped on its own a little bit but then recovered on its own. After the 3rd trigger it started to loop

$ python test.py 
Only got 91 out of 1792 bytes.
Logic analyzer did not reset to IDLE state.
Logic analyzer did not reset to IDLE state.
Only got 133 out of 1792 bytes.
Logic analyzer did not reset to IDLE state.
Logic analyzer did not reset to IDLE state.
<stable>
Only got 119 out of 1792 bytes.
Logic analyzer did not reset to IDLE state.
Logic analyzer did not reset to IDLE state.
<stable>
Only got 119 out of 1792 bytes.
Only got 133 out of 1792 bytes.
Only got 119 out of 1792 bytes.
Only got 119 out of 1792 bytes.
Only got 126 out of 1792 bytes.
Only got 126 out of 1792 bytes.

At 512 the design stabilizes again, and I start to get just 1 capture per button press again:

$ python test.py 
Only got 210 out of 3584 bytes.
Only got 210 out of 3584 bytes.
Only got 210 out of 3584 bytes.
Only got 203 out of 3584 bytes.
Only got 217 out of 3584 bytes.
Only got 210 out of 3584 bytes.
Only got 210 out of 3584 bytes.
Only got 210 out of 3584 bytes.
Only got 217 out of 3584 bytes.
Only got 217 out of 3584 bytes.

Pardon all the data, but I figured I'd pass along my notes in case they yield some more clues. I have an Arty A7-35T available to test with too, I'll probably try to port this experiment to that board the next time I'm tinkering.

fischermoseley commented 6 months ago

Apologies for the super delayed response here! I did some quick checking of the internal memory map that's generated in response to your config file, and everything looks okay there. Assuming that your host machine is properly transmitting UART (which I think is fair since you're getting data packets back from the board with the D0002-type messages) I'm thinking that the issue is somewhere between the host side of the USB cable and the output of Manta's internal UART receiver.

I did try and replicate your setup on my Icestick and Nexys4DDR, and unfortunately everything seemed to work okay. Here's the top level Verilog file I'm using:

`include "manta.v"

module top_level (
    input wire clk,

    input wire rs232_rx_ttl,
    output logic rs232_tx_ttl
    );

    logic [3:0] value;
    logic sw2;

    always @(posedge clk) begin
        value <= value + 1;
        sw2 <= (value == 2);
    end

    manta manta_inst (
        .clk(clk),
        .rst(0),

        .rx(rs232_rx_ttl),
        .tx(rs232_tx_ttl),

        .sw2(sw2));
endmodule

I can think of a few things to try though:

What happens if you set chunk_size to 1? This will make the transfer super slow, but if we're hitting a metastability issue then it might the probability of a misaligned UART packet a bit lower. And maybe that's low enough to be useful?
What happens if you toss another register between the UART RX/TX pins and the RX/TX inputs to the Manta module? Just to again try and tune the probabilities in our favor if it's a metastability issue.
It's not the USB cable, is it?
Anything interesting in your build logs? If Yosys/nextpnr are tossing any warnings/errors those might be useful.

Slight meta comment: I'm beginning to realize that me being able to replicate an issue is a lousy prerequisite for being able to solve it, so I think Manta might need a logging/reporting system built out in the near future. Something for me to think about :)

ejhollas commented 3 months ago

I am seeing similar problems with my pico-ice and my ULX3S boards. What I notice is the UART from the FPGA has a gap every time there's a ValueError. I tried the chunk_size:32. My ULX3S is running at 3M Baud.

My DSView analyzer shows this pause in the RX line:

Here's a close up of the data which should be a Manta capture:

I can run the capture multiple times, and it will succeed sometimes. I don't think this is a Host issue, since I am looking at the wire before it gets into the SerialToUSB adapter. Something may be pausing the TransmitBridge.

ejhollas commented 3 months ago

It took a while, but I finally reproduced the bug on my Zybo from Digilent. This board build uses the Vivado toolchain rather than Yosys. Also, this board is running with a 125 MHz clock and a 2048 Manta sample_depth. In this case, the capture ValueError reported 49 out of 1792 bytes. Here's the waveform:

With the Manta ILA running on a Xylinx part, I will see what I can discover with the native ILA. Thanks again for such a well-designed tool.

fischermoseley commented 3 months ago

Thank you for the scope shots! And for reproducing on the Zybo - that's very very useful. I'd be curious to know what happens at slower baudrates - do you see the same behavior at 115200 baud?

And if you're going hunting with the Xilinx ILA, I'd be curious to know what the inputs to the uart_tx module look like whenever this hiccup happens. In the exported Verilog there will be a portion containing something like:

\top.manta.interface.uart_tx  uart_tx (
    .clk(clk),
    .data_i(\data_i$15 ),
    .done_o(done_i),
    .rst(rst),
    .start_i(start_i),
    .tx(tx)
  );

If you can probe what the inputs/outputs of this module are doing, that would tell us if the TransmitBridge (\manta.interface.bridge_tx in the generated Verilog) or UARTTransmitter (the module above) aren't transmitting. It's also possible that the bug isn't here, so if you don't see anything strange then it might be worth checking the \manta.interface.uart_rx module's IO too.

ejhollas commented 3 months ago

I do see the same behavior at 115200 baud. Looking closer at the UART signals, I think the problem due to the ReceiverBridge missing a Read request. If you look at the start of these responses to the Host, you can see the response packet starts later and later. Eventually, the delay is too much.

Here's an example of a missed Read response:

At the start, the spacing looks as follows:

The easiest solution is to reduce the UART chunk size to 32 or 16 in the manta config with a 115200 baud UART. This corresponds with the behavior that @KennethWilke had reported earlier.

If I can't get a good trigger to get the ILA to capture the internal behavior, I'll port the code into Verilator and use a fake COM port to simulate the system or see if I can extend the Amaranth simulator with this full capture example.

ejhollas commented 3 months ago

I have a good Amaranth simulation, and I see the bug is in the TransmitBridge not the ReceiveBridge. The TransmitBridge is still busy when a new Read response comes to its internal interface. In this trace you can see the gap at the 7ms mark. Looking at the start_o signal, you can see it is on for a shorter duration each time.

This simulation is running the UART 3M baud. The test first writes 0x37 and 0x42 to two ioports. Then the test reads the first port, then alternates between reading each ioport. So we should see the _buffer[15:0] from the TranmitBridge alternate between these two values.

ejhollas commented 3 months ago

The TransmitBridge has two implicit states: _busy and ~_busy. Only during the ~_busy state can it accept new data:

    def elaborate(self, platform):
        m = Module()

        m.d.comb += self.start_o.eq(self._busy)

        with m.If(~self._busy):
            with m.If((self.valid_i) & (~self.rw_i)):
                m.d.sync += self._busy.eq(1)
                m.d.sync += self._buffer.eq(self.data_i)

        with m.Else():
            # uart_tx is transmitting a byte:

There are a few ways to fix this, but I want to keep in mind the project's desire to keep the RTL as simple as possible. We can avoid the bug by pausing the Host's read requests periodically. My first idea is to add a second buffer which can be filled while the TransmitBridge module is busy. The next idea is to implement a stall in the upstream module(s) to not lose a response. Perhaps a skid buffer in one or two modules would help.

fischermoseley commented 3 months ago

Apologies for the late reply here! And thank you for the very in-depth debugging. I think my interpretation of this issue is something like the following:

As a protocol, UART needs to be able to handle slight baudrate mismatches between devices, since there's no shared clock. This normally works fine, as most devices that communicate over UART have some transmit/receive buffering, such that a device transmitting bytes slightly slower than it receives them wouldn't drop packets (if it wasn't fully loaded all the time). But here, there's no buffers on the FPGA, so bytes get dropped if they can't be transmitted. Which causes the original Not enough bytes received error.

I think it's possible to solve this on the host side. The ReceiveBridge on the FPGA will ignore any characters between one packet's end-of-message and the next packet's start-of-message. It would be possible to inject an ignorable character after some N packets, which gives the TransmitBridge time to catch up. This shouldn't appreciably affect the effective bandwidth back to the host machine (so long as N is large), and doesn't need any new RTL.

I'll put together a branch that implements this sometime soon. Since I can't replicate this with my hardware, would you mind giving it a go?

fischermoseley commented 3 months ago

(as an aside, I've been playing with the idea of rearchitecting the UART/Ethernet packet formats at some point in the future, and that would include a dropped packet recovery scheme that likely supercedes this. But I think this is definitely worth fixing before then)

fischermoseley commented 3 months ago

Just put a patch on the uart_speed_fix branch. @KennethWilke feel free to give this a try as well.

ejhollas commented 3 months ago

Moving the change to the host will probably solve the problem without increasing the cost of gates or complexity in the FPGA. I'll give this a try.

fischermoseley commented 1 month ago

Closing as the fix has been merged to main via #25. I've added a stall_interval parameter to the UARTInterface (configurable via manta.yaml or the new Amaranth API) which sets how many packets are sent before an extra stall byte (a newline) is sent. I've also added test/test_uart_baud_mismatch.py which performs a hardware-in-the-loop test to verify that this is the root cause, and test that this mitigation works.

fischermoseley / manta

Not enough bytes received during Logic Analyzer capture #18