Problem with VDMAs and AXI-Interconnect

mohammadhgh commented 2 years ago

@OVGN Hi, First I want to thank you for your great work and that you made it open source which is a very valuable.

I saw a problem in my system using openHBMC and found a workaround for it, so I decided to report it here for use of others facing it. Actually the problem is not still very clear to me and I don't exactly know where the cause of problem is, so I just report my observations without any conclusion.

I have three VDMAs and a Microblaze with I-Cache and D-Chache enabled and connected to openHBMC by AXI-Interconnect IP (Vivado's default suggestion is AXI-SmartConnect but it uses a lot of resources!). This is part of my system in Vivado: Screenshot from 2022-01-15 12-45-24

Microblaze is configured with 8KB D-Cache and 8KB I-Cahce and each cache is configured with 16 Line Length parameter for better performance. When VDMAs Memory Map Data Widthparameter is configured automatically which is 64 bits, memory test (template from Vitis with no changes) Fials. This failure is not always the same as sometimes it fails only for 32 bits test, sometimes for all tests and sometimes microblaze satlls. However, when I change the VDMAs Memory Map Data Width parameter manulally and set it to 32 bits every thing is OK and memory test passes.

My system spec is this:

FPGA: Spartan-7 : XC7S50-1FTGB196C
HyperRAM: S27KL0641DABHB02 (64 Mb & 3.3v)
openHBMC v2.0 and all settings to default values
HyperRam freq: 100 MHz
AXI and MB freq: 100 MHz

OVGN commented 2 years ago

Hello!

Thanks a lot for a detailed report. I will try to investigate this issue soon. Sorry, I haven't tested 16-bit and 64-bit mode enough. Hope that 32-bit AXI bus width mode works stable for you. Stay sure that AXI 32-bit mode will not limit memory throughput according with you system specs:

(AXI width x AXI clock) >= (HBMC width x HBMC clock x DDR)
32bit x 100MHz >= 8bit x 100MHz x 2

mohammadhgh commented 2 years ago

Stay sure that AXI 32-bit mode will not limit memory throughput according with you system specs:
(AXI width x AXI clock) >= (HBMC width x HBMC clock x DDR)
32bit x 100MHz >= 8bit x 100MHz x 2

Thank you very much for your reply. You are write, I was missing a very important point. I changed the clock of the AXI port of OpenHBMC to 50 MHz and every thing seems to be OK. I will change the Hyper RAM part to get higher frequencies in my board's future versions.

mohammadhgh commented 2 years ago

@OVGN Hello,

Last time I changed the Microblazr and OpenHBMC clock to 50 MHz and the memory test program was OK. However, my own application still has problem. Microblaze stall's on a specific instruction. I checked the OpenHBMC AXI port and saw that there is a write transaction that doesn't complete ever. Here is the ILA data for OpenHBMC AXI port:

Screenshot from 2022-01-19 13-07-45

As you can see, AWVALID is asserted but AWREADY stays zero forever. This causes an overflow to happen in data cache port.

I was using OpenHBMC v1.1 with my application successfully for several months (I am not sure if I had a successful run with v2.0 or not) so I will try to find out what has changed and caused this problem.

OVGN commented 2 years ago

Hello, @mohammadhgh

This is quite strange. Such kind of stalls should never happen. I need more details to figure out what happend. Your screenshot is good, but not enough. Could you, please, catch this issue on ILA again and export .vcd file? I will be able to see and analyze all signals on your waveform.

vivado_vcd

OVGN commented 2 years ago

At least I see one strange thing on your waveforms:

axi_issue_awlen_question

Please, upload VCD file.

mohammadhgh commented 2 years ago

@OVGN

Hi,

These are two ILA files for the picture I uploaded previously. One is for for the AXI Interconnect Slave side which is connected to Microblaze I-Cache and D-Cache an also Microblaze Trace port and the other ILA file is for AXI Interconnect Master side that is connected to OpenHBMC.

iladata.zip

OVGN commented 2 years ago

Hi,

Thanks a lot for ILA files. I have found the issue. This is very stupid bug. I will try to fix it today.

OVGN commented 2 years ago

Hello,

I have fixed the AXI stuck issue. Please update you IP and let me know if it works for you.

Concerning to 16-bit and 64-bit mode, they are still not tested yet, going to do this in a few day, so please continue using AXI 32-bit data width mode.

mohammadhgh commented 2 years ago

@OVGN

Hello,

Thank you very much for the changes. I tested the new OpenHBMC in new test Vivado and Vitis projects which have only Microblaze subsystems and OpenHBMC. In this project, the Vitis's memory test program passes and there is no problem.

However, I still have memory issues in my main application project. For some specific cache sizes or cache line lengths, the memory test program fails. I checked the AXI port of OpenHBMC to see what is the problem and this the result. This is the first word write process in Vitis's memory test program which writes 0x00000001 to address 0x000000 and is done successfully.

Screenshot from 2022-01-24 14-12-26

But the first word read back fails as the data read back is 0x00008001:

Screenshot from 2022-01-24 14-11-34

I increased Microblaze cache size and cache line length and this time memory test program passes. This the ILA result:

Screenshot from 2022-01-24 14-25-06

The only difference that I see in this two situations is that the AXI ID width is changed from 3 to 1. Do you have any idea about this?

Also because I am using Vivado 2021.2, I upgraded the fifo IP cores inside the OpenHBMC but it didn't change the result. iladata.zip is also attached.

OVGN commented 2 years ago

Hi, @mohammadhgh

Many thanks again for detailed reports.

I have analyzed the diagrams and made different tests with various cache line length and size. Finally, I could catch wrong read failure. In fact there is no relation between cache line length, cache size and incorrect data at read. This is just some specific design placement, that causes error. This is quite unexpected... Nevertheless, I have fixed IP core primitives location to stabilly reproduce this bug, having some internal signals connected to ILA. Investigations are in progress. Hope to find the root of the problem tomorrow.

OVGN commented 2 years ago

Hi, @mohammadhgh

I have some results. Looks like this is not design logic issue. In fact this is timing issue.

I was running tests on mb_dual_ram design project. I noticed, that I forgot to declare system clock frequency constraint. There were a lot of timing warnings like _no_clock, unconstrained_internal_endpoints, no_input_delay, no_outputdelay. Most of the violations can be ignored, as this is CDC synchronizers or input/output delay warnings, that are already resolved by design. But I decided to add a constraints file for IP to resolve all warnings.

Any other timing violations are critical and must be fixed by IP user. As you are using ILA, I strongly recommend to set for all your ILA cores this option: Input Pipe Stages = 1. It will not affect captured ILA data, but will help a lot to relax timings.

In general, I think that your design fails due to timing violations in IP core. Probably you haven't noticed them, as there are quite a lot of warning, because I hadn't added contraints file. I'm going to fix this very soon, probably tomorrow.

mohammadhgh commented 2 years ago

Hi @OVGN

Thank you very much for your work and fast replies.

I have a compilation in which there is no problem. I fixed the placement of the OpenHBMC module and everything is good for now, so I think you are right and it seems like a timing issue.

I checked OpenHBMC synthesis reports and found there exist these critical warnings about FIFO IP cores inside OpenHBMC. Then I upgraded FIFO IP cores, but the warnings where still there so I just ignored theme!

[Designutils 20-1280] Could not find module 'fifo_18b_18b_512w'. The XDC file /..../fifo_18b_18b_512w.xdc will not be read for any cell of this module.

Also I remember that there where an XDC file in the OpenHBMC v1.1 which I noticed is omitted in the current version. With the v1.1, I had always a negetive slack around 0.7 ns with 100MHz Hyper Bus clock. The problem is that in our board, the RWDS is not connected to a clock compatible pin, and also the speed grade of the FPGA is -1. We are planning to change the board and in the new board we will connect the RWDS to an MRCC or SRCC pin.

OVGN commented 2 years ago

Hello, @mohammadhgh

I checked OpenHBMC synthesis reports and found there exist these critical warnings about FIFO IP cores inside OpenHBMC. Then I upgraded FIFO IP cores, but the warnings where still there so I just ignored theme!

I don't know why, but Vivado cannot understand that fifo_18b_18b_512w is not used due to selected parameter and there is no need to apply .xdc file of unused module. I'm going to replace Xilinx FIFO IP with custom ones to make design more flexible and remove these annoying errors. Yes, please, ignore these warning for a while.

Also I remember that there where an XDC file in the OpenHBMC v1.1 which I noticed is omitted in the current version. With the v1.1, I had always a negetive slack around 0.7 ns with 100MHz Hyper Bus clock.

Right, I removed that XDC file, as it was incorrect. Now I'm working at new one, adding all needed constraints for OpenHBMC and will return correct XDC file back. Just need a bit more time to finish this...

The problem is that in our board, the RWDS is not connected to a clock compatible pin, and also the speed grade of the FPGA is -1. We are planning to change the board and in the new board we will connect the RWDS to an MRCC or SRCC pin.

In common HyperBUS IP design RWDS is used to sample data bus. In this case, you are right, RWDS should be connected to a clock capable pin. As RWDS is guaranteed to be edge aligned to data, we should delay RWDS to shift it to the center of the data bit. The flaw of this scheme is that calibration procedure is needed to shift RWDS. Also for low frequencies, even 100MHz this is quite hard to delay RWDS by 5ns. Single IDELAY primitive can delay signal for 2.5ns max. In this case probably IDELAY cascading can help, no matter. Also theoretically HyperBUS tCKDS and tCKD timing values can vary with temperature of the memory part and probably periodic RWDS recalibration will be needed for reliable operation.

OpenHBMC data reception logic is designed in completely different way. RWDS is not used to sample data bus. RWDS is oversampled by x6 clock (x3 in DDR) along with data bus. There is no special IO placement requirements for RWDS, it can be connected to clock capable or common FPGA pin. After that oversampled data and RWDS goes to DRU (data recovery unit) that detects RWDS rising and falling edges and selects right data samples to recover data. There is no need to make any kind of calibration with this scheme. DRU FSM covers all possible conditions and always should be able to recover data.

In general, if you can connect RWDS to MRCC/SRCC - do it. This is probably will be mandatory for some HyperRAM memory controller IPs, but OpenHBMC doesn't need it at all:

Single_OpenHBMC_Floorplan_200MHz

Concerning performance, I'm using commercial lowest speed grade XC7S50-1CSGA324C with W956D8MBYA5I. I have stable working project mb_single_ram configured to run W956D8MBYA5I part at max possible frequency of 200MHz.

OVGN commented 2 years ago

Hi, @mohammadhgh

I have released rev.83. Among other improvements, I have finally added constraints file, i.e. no more timing critical warnings.

mohammadhgh commented 2 years ago

Hi @OVGN

Sorry, I didn't have access to my system for a few weeks to test your new revision. Now I tested it and everything is OK. Thank you very much for your support.

OVGN / OpenHBMC

Problem with VDMAs and AXI-Interconnect #4