Nuand / bladeRF-wiphy

bladeRF-wiphy is an open-source IEEE 802.11 compatible software defined radio VHDL modem
GNU General Public License v2.0
377 stars 49 forks source link

Stability issues with latest Quartus version #5

Closed MyTechCatalog closed 3 years ago

MyTechCatalog commented 3 years ago

I am using Quartus Prime 20.1.1. What version of Quartus was used to build the FPGA design? I am experiencing stability issues when I so much as add a .qip file for a new top level Platform Designer System intended to be at the top level alongside the NIOS system that comes with the design. I don't even have to instantiate anything in the top level VHDL file. Just adding the .qip file is enough to cause the resulting build to quit detecting packets after a random short interval of time.

If I build the unmodified ~/wiphy-build/bladeRF/hdl/fpga/platforms/bladerf-micro/bladerf-wlan.qip I can see a continuous printout of packets scrolling in the Linux terminal window where I am running bladeRF-linux-mac80211 in verbose mode.

I would say that works just fine. However, if I so much as add a .qip file for a new subsystem to the main bladerf-wlan.qip file and rebuild, the resulting modem quits after detecting a few packets. I have been able to correlate this to the version number of the components in the Platform Designer System .qip file; it's 20.1.

rghilduta commented 3 years ago

I'm currently building with Quartus Prime Lite Edition 19.1 . Just to double check, is the "Total registers" the same between good and bad builds? Also, in the compilation report, under "Timing Analyzer", do either "Setup Summary" and "Hold Summary" show up in red? Is the Fmax at least 80MHz for U_wlan_top:U_wlan_rx in the "Slow 1100mV * Model" ?

MyTechCatalog commented 3 years ago

I downloaded Quartus Prime Lite Edition 19.1.0 Build 670 and got the same behavior. Below are some numbers relevant to answering your questions. For Setup and Hold summaries, I have listed the top line (worst) for each.

Good Build 1 Quartus Prime Version 20.1.1 Build 720 11/11/2020 SJ Lite Edition Total registers: 55763 Slow 1100mV 85C Model Fmax Summary ; 86.38 MHz ; 86.38 MHz ; U_wlan_top|U_wlan_rx|U_80mhz_clock|altera_pll_i|general[0].gpll~PLL_OUTPUT_COUNTER|divclk

Slow 1100mV 85C Model Setup Summary ; U_nios_system|axi_ad9361_0|i_dev_if|i_rx|i_altlvds_rx|auto_generated|pll_sclk~PLL_OUTPUT_COUNTER|divclk ; -0.688 ; -4.643 Slow 1100mV 85C Model Hold Summary ; altera_reserved_tck ; 0.193 ; 0.000

Bad Build 1 Quartus Prime Version 19.1.0 Build 670 09/22/2019 SJ Lite Edition Total registers: 55582 Slow 1100mV 85C Model Fmax Summary ; 87.31 MHz ; 87.31 MHz ; U_wlan_top|U_wlan_rx|U_80mhz_clock|altera_pll_i|general[0].gpll~PLL_OUTPUT_COUNTER|divclk

Slow 1100mV 85C Model Setup Summary ; U_nios_system|axi_ad9361_0|i_dev_if|i_rx|i_altlvds_rx|auto_generated|pll_sclk~PLL_OUTPUT_COUNTER|divclk 0.216 ; 0.000
Slow 1100mV 85C Model Hold Summary ; U_wlan_top|U_wlan_rx|U_80mhz_clock|altera_pll_i|general[0].gpll~PLL_OUTPUT_COUNTER|divclk ; 0.279 ; 0.000

Bad Build 2 Quartus Prime Version 19.1.0 Build 670 09/22/2019 SJ Lite Edition Total registers: 55593 Slow 1100mV 85C Model Fmax Summary ; 87.56 MHz ; 87.56 MHz ; U_wlan_top|U_wlan_rx|U_80mhz_clock|altera_pll_i|general[0].gpll~PLL_OUTPUT_COUNTER|divclk

Slow 1100mV 85C Model Setup Summary ; U_nios_system|axi_ad9361_0|i_dev_if|i_rx|i_altlvds_rx|auto_generated|pll_sclk~PLL_OUTPUT_COUNTER|divclk ; -0.312 ; -0.631 Slow 1100mV 85C Model Hold Summary ; altera_reserved_tck ; 0.199 ; 0.000

Bad build 3 Quartus Prime Version 19.1.0 Build 670 09/22/2019 SJ Lite Edition Total registers: 55593 Slow 1100mV 85C Model Fmax Summary ; 87.56 MHz ; 87.56 MHz ; U_wlan_top|U_wlan_rx|U_80mhz_clock|altera_pll_i|general[0].gpll~PLL_OUTPUT_COUNTER|divclk

Slow 1100mV 85C Model Setup Summary ; U_nios_system|axi_ad9361_0|i_dev_if|i_rx|i_altlvds_rx|auto_generated|pll_sclk~PLL_OUTPUT_COUNTER|divclk ; -0.312 ; -0.631
Slow 1100mV 85C Model Hold Summary ; altera_reserved_tck ; 0.199 ; 0.000

rghilduta commented 3 years ago

Any luck if the bladeRF and bladeRF-wiphy repositories are git clean'ed between builds? Also, what's in the qip file that's being added?

MyTechCatalog commented 3 years ago

The first bad build up there was actually a clean checkout of the project folders ~/wiphy-build/bladeRF and ~/wiphy-build/bladeRF-wiphy I no longer think that the .qip file is the cause anymore. I was trying to find a pattern, but I did not have enough samples.

The reason I thought that the .qip file had something to do with it, is that as I backed out any changes that I had made, and tried to rebuild, I kept getting unstable builds, until I finally commented out the .qip file. I have since been able to get unstable builds with the aforementioned .qip line commented out, including a clean checkout.

The qip file lists the components of a system with a NIOS II/e CPU, JTAG UART, JTAG Avalon Master, on-chip RAM (64k), PIO (8bit output), 2 MSGDMAs (ST-to-Memory and Memory-to-ST), 80MHz clock source, a 1K Dual-Port on-chip RAM (second port exported) and an Avalon SYSID component. But as I mentioned earlier, the system doesn't have to be instantiated for the unstable build to happen.

My next step is to try some Virtual Machines: Clean OS install, clean build, and see what happens. Maybe there is a subtle error or something happening in all the scrolling build output (due to my environment). I am just guessing, but it's about all I can do at this point.

rghilduta commented 3 years ago

I'm wondering if the .QIP was somehow overriding some constraints or maybe left behind some kind of .SDC file. Trying a VM should hopefully allow for clean environment and good builds to happen again. Also, this may be unlikely, but if there is any faulty RAM in a system, synthesis being very RAM intensive (in terms of the amount of space and number of accesses) may be affected. So trying another system altogether might be an interesting data point/

MyTechCatalog commented 3 years ago

Thanks for the pointers. I wasn't even going to try another system. I do have a lot of computers lying around, I might as well put them to work. I had similar thoughts regarding the QIP.

MyTechCatalog commented 3 years ago

Some interesting developments: My first build in a virtual machine starting from a new checkout was also behaving like one of those so called "unstable builds" listed up there. So on a hunch, I decide to try the official release build from https://www.nuand.com/fpga_images/ and it behaved the same way as the "bad builds". That did not make sense. Well, it so happens that I spent the better part of two days after I opened this issue setting up an ad-hoc network between two Linux virtual machines, each connected to a bladeRF 2.0 micro xA9 SDR running bladeRF-wiphy.

I successfully pinged each machine from the other, and can confirm RX and TX packets were being printed out in the terminal window running bladeRF-linux-mac80211 I was using the release build from https://www.nuand.com/fpga_images/ specifically https://www.nuand.com/fpga/wlanxA9-latest.rbf.

So I decided to setup the ad-hoc WiFi connection using each of the "bad builds" I listed earlier, and they all worked. I was able to ping each machine from the other 100 times. I turns out there is something about my setup when I first run into stability issues: I had RX1 on the bladeRF 2.0 micro xA9 connected directly to an SMA connector on a WiFi router via a 30dB attenuator capable of dissipating 2W of power. The router was also set to 25% power. The rest of the bladeRF SMA connectors were terminated with dummy loads cable of dissipating 2W of power. There is something I don't understand about that particular setup that causes the board to exhibit unstable behavior. In any event, I think I'll close this issue.