Severson-Group / AMDC-Firmware

Embedded system code (C and Verilog) which runs the AMDC Hardware
http://docs.amdc.dev/firmware
BSD 3-Clause "New" or "Revised" License
30 stars 5 forks source link

Random invalid data in new AMDS driver? #389

Closed codecubepi closed 1 month ago

codecubepi commented 1 month ago

Testing of the new AMDS driver has been going well, save for one problem(?).

When running the timing manager in automatic triggering mode (the default), the AMDS FPGA counters report invalid data (meaning that the parity check failed). The problem seems to get worse with lower and lower user ratios. It does not seems to occur with sufficiently high user ratio or in MANUAL triggering mode.

I can't seem to actually "capture" any invalid data, so I'm not entirely convinced this is a big deal, but it might require further testing.

image

codecubepi commented 1 month ago

Update:

After logging some signals using the new AMDS driver firmware, I get the following graphs:

Image

This is logging one second of data for two channels. Channel 1 is a 10 Hz sine wave and Channel 2 is a 5 Hz skewed triangle wave. Both channels appear to have flawless data. However, I am still seeing that the "data is invalid" and the corrupt counter go up.

I am running the timing manager and AMDS driver with the following parameters: trigger on both PWM high and low, with a user ratio of 4. This is effectively toggling SYNC_ADC at 50 kHz, or every 20us, or as fast as the sensor can currently be triggered,

I am noticing a few strange things when I check the channel valid register. It's value is always either 0, 15, 240, or 255. These values mean the following:

Value (Decimal) Value (Binary) Meaning
0 0000_0000 All channels are invalid
15 0000_1111 Only channels 1-4 are valid
240 1111_0000 Only channels 5-8 are valid
255 1111_1111 All channels are valid

These valid bits are all cleared when a new set of packets comes in, and should be being set one by one, as each of four packets looks valid. But this is pretty suspicious, to only see these statuses for valid. There are a few possible explanations:

Regardless of the reason for this, we still have the issue with the corrupt counter. The corrupt counter is incrementing because the uart_rx module is telling its parent adc_uart module that either the header, data MSB, or data LSB is "corrupt", meaning it failed the parity check. But again, I don't actually see any evidence of truly corrupt data.

codecubepi commented 1 month ago

With help from @npetersen2 the issue has been solved! See commit 8cad1d6.

The issue was in the FPGA cycle delay caused by the hierarchical state machines in the driver. The amdc_amds_yada_AXI.v file double-flops the incoming data lines from the AMDS for metastability, and then detects a falling edge on the flopped data line to start the SM in adc_uart_rx.v, which handles the receving of the four packets (HEADER, MSB, and LSB).

That state machine starts the other state machine in uart_rx.v, which receives the individual UART bytes. After this state machine is started by the parent, it looks for the falling edge of the start bit.

The problem was that the sequential delay of the start signal being flopped through the state machine hierarchy meant that the start bit had sometimes already passed, so the uart_rx state machine was detecting a falling edge of a data bit instead, which caused a parity check fail and so on.

The solution was to flop the data line a few more times in uart_rx.v, to guarantee that the WAIT_FOR_START_BIT state was always entered before receiving the start bit:

https://github.com/Severson-Group/AMDC-Firmware/blob/8cad1d6708cc6569a32f72006c2b0e16b37996fb/ip_repo/amdc_amds_1.0/src/uart_rx.v#L34-L59

Results below:

Image

Image

codecubepi commented 1 month ago

Closing with merge of #394 into the staging branch.