Xilinx / embeddedsw

Xilinx Embedded Software (embeddedsw) Development
Other
963 stars 1.07k forks source link

RFDC driver is incompatible with device-tree TCL #252

Open gsmecher opened 1 year ago

gsmecher commented 1 year ago

Problem 1: device tree mismatch in embeddedsw

The rfdc driver expects a device tree with most of the configuration tucked into a "param-list" entry, as can be seen here:

    Status = metal_linux_get_device_property(DevicePtr, XRFDC_CONFIG_DATA_PROPERTY,
                         &Data, XRFDC_DEVICE_ID_SIZE);

However, the rfdc.tcl script that lives alongside this driver code generates much richer device-tree entries, and no param-list entry. For example, here's the kind of entries I get when running this version of rfdc.tcl instead:

[...]
            xlnx,adc-bypass-bg-cal00 = "false";
            xlnx,adc-bypass-bg-cal01 = "false";
            xlnx,adc-bypass-bg-cal02 = "false";
            xlnx,adc-bypass-bg-cal03 = "false";
            xlnx,adc-bypass-bg-cal10 = "false";
            xlnx,adc-bypass-bg-cal11 = "false";
            xlnx,adc-bypass-bg-cal12 = "false";
            xlnx,adc-bypass-bg-cal13 = "false";
            xlnx,adc-bypass-bg-cal20 = "false";
            xlnx,adc-bypass-bg-cal21 = "false";
            xlnx,adc-bypass-bg-cal22 = "false";
            xlnx,adc-bypass-bg-cal23 = "false";
            xlnx,adc-bypass-bg-cal30 = "false";
            xlnx,adc-bypass-bg-cal31 = "false";
            xlnx,adc-bypass-bg-cal32 = "false";
            xlnx,adc-bypass-bg-cal33 = "false";
            xlnx,adc-calopt-mode00 = <0x1>;
            xlnx,adc-calopt-mode01 = <0x1>;
            xlnx,adc-calopt-mode02 = <0x1>;
            xlnx,adc-calopt-mode03 = <0x1>;
            xlnx,adc-calopt-mode10 = <0x1>;
            xlnx,adc-calopt-mode11 = <0x1>;
[...]

This is much better than a "param-list" blob, but unfortunately the drivers don't understand it. Since both drivers (rfdc_sinit.c) and generator (rfdc.tcl) live in the same repository, I'm confused why they don't work together. That's problem 1.

Problem 2: param-list mismatch

Let's try, instead, generating my param-list using the device-tree generator shipped in device-tree-xlnx. This gives me a param-list blob that looks like the following:

        param-list = [
            00 00 00 00 00 00 00 80 00 00 00 00 01 00 00 00
            00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00
            02 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
            00 00 00 00 00 00 14 40 00 00 00 00 00 00 69 40
            00 00 00 00 00 88 73 40 32 00 00 00 02 00 00 00
            01 00 00 00 00 00 00 00 00 00 00 00 00 00 1c 40
[...]

If I'm going to use the drivers from embeddedsw, this param-list must match the in-memory layout of the XRFdc_Config object. However, this structure includes a LinkCoupling field for each DAC tile:

    u32 Enable;
    u32 PLLEnable;
    double SamplingRate;
    double RefClkFreq;
    double FabClkFreq;
    u32 FeedbackDiv;
    u32 OutputDiv;
    u32 RefClkDiv;
    u32 MultibandConfig;
    double MaxSampleRate;
    u32 NumSlices;
    u32 LinkCoupling;                             /* THIS ONE HERE */
    XRFdc_DACBlock_AnalogDataPath_Config DACBlock_Analog_Config[4];
    XRFdc_DACBlock_DigitalDataPath_Config DACBlock_Digital_Config[4];
} XRFdc_DACTile_Config;

This field is not emitted by the rfdc.tcl captured in device-tree-xlnx, so every field after the first missing LinkCoupling entry is misaligned, the param-list array is unexpectedly short, and hilarity ensues.

Summary of my questions

  1. Why doesn't rfdc.tcl produce device-tree content that matches param-list expected in rfdc_sinit.c?
  2. Where am I supposed to find a matching rfdc.tcl and driver code? The combinations I'm working with clearly don't agree.

As always, thanks for your time and attention.

gsmecher commented 1 year ago

Aha - I think the "master" branch of device-tree-xlnx is stale, and I should be working from a version-specific branch. Is it possible to somehow shout this from the hilltops (or delete the "master" branch entirely)?

varalaxmi-bingi commented 1 year ago

hi @gsmecher yes the current master branch of device-tree is not in sync with latest changes. We are planning to sync that in 2023.2 release. For latest changes you can use our release branches.

Regards, Varalaxmi

gsmecher commented 5 months ago

@varalaxmi-bingi: Longer term, do you plan to move away from the param-list blob? This method of shipping configuration between TCL and C code is extraordinarily brittle: any time the struct definitions change, it's a rather painful process to diagnose and fix structure misalignment.

For example, the NCO field added between rfdc_v12_0 and rfdc_v12_1 broke device-tree generation again. The structs in xrfdc.h also aren't __attribute__((packed)), and I imagine there are hidden field alignment issues just waiting to pop up.

Thanks again for all your efforts.

NghiSpectrohm commented 5 months ago

@gsmecher With new Xilinx 2024.1 drop, and xilinx-linux driver of petalinux recommends SDT workflow that using sdt_gen to create project-specific device-tree folder from XSA file, instead of paramter.h.

Would the SDT workflow bypass this parameter.h and xilinx-linux embeddedsw mismatched? I think if any mismatch between embeddedsw drivers and SDT-output from Xilinx, we still have the same issue as TCL -> device-tree?

gsmecher commented 5 months ago

@NghiSpectrohm - thanks for the quick response. As I understand it, the challenge isn't where the device tree is generated - it's the fact that it contains a binary representation of a C struct, instead of (more conventional) named fields and values. Adding new fields in a serialized binary struct is not safe, especially without some kind of version-number or struct-size guard in the downstream code.

A SDT device-tree generator that produces param-list blobs is perhaps a little more convenient than the current xsa export -> xsct script process, but unless param-list goes away, it's not really more robust.

NghiSpectrohm commented 5 months ago

@gsmecher Thank you for the explanation; your argument makes sense. Base on Xilinx developer feedback on AMD forum, they are not changing parameter as input to xilinx embeddedsw libraries. Biggest change from 2023 to 2024 version is moving TCL workflow to SDT workflow for xinlinx-linux drivers.

Good news is that rfdc library worked on 2024.1 baseline code; or at least RFDC code ran past initialization portion. Bad news RFCLK library got borked with new SDT workflow. I think Xilinx 2024 SDT-gen misinterpreted some parameter incorrectly and bricked petalinux with 2024.1 xilinx-linux drivers.

Your feedback is good; I just do not know if Xilinx has a better way dealing with their own parameter config generated by TCL or SDT that have to work with their firmware drivers (embeddedsw libraries). They spent over 2 years moving away from TCL and forcing developers to use the new SDT workflown, which AMD actively warning XSCT and TCL workflown deprecated after 2024.2 version.

For example, current version 2024.1 SDT-gen had this issue mismatched in parameter name and caused issue with XSA parameter generated by SDT:

WARNING: CONFIG.PSUPSS_REF_CLKFREQMHZ not found. Using default value for XPAR_PSU_PSS_REF_CLK_FREQ_HZ

If AMD/Xilinx code developers do not have a way to deal with incremental changes and messed up their parameter name conversion, the hardware (XSA) vs firmware (RFDC) will continue having issue after every major version upgrade.

Last major upgrade 2021 to 2022 was the worst as Yocto (base for petalinux) major changed and that broke quite a lot of Xilinx codes. That was the year RFDC struct calls changed as well, and I had to update my firmware codes to work with new RFDC structs. That was as bad as 2019-2020 major revision.

varalaxmi-bingi commented 5 months ago

Hi @O'Griofa, @.***>, Could you please log your comments on the bugcase for the below customer reported issues as you’re the owner of the rfdc driver.

Regards, Varalaxmi From: NghiSpectrohm @.> Sent: Friday, June 14, 2024 12:43 AM To: Xilinx/embeddedsw @.> Cc: Bingi, Varalaxmi @.>; Mention @.> Subject: Re: [Xilinx/embeddedsw] RFDC driver is incompatible with device-tree TCL (Issue #252)

@gsmecherhttps://github.com/gsmecher Thank you for the explanation; your argument makes sense. Base on Xilinx developer feedback on AMD forum, they are not changing parameter as input to xilinx embeddedsw libraries. Biggest change from 2023 to 2024 version is moving TCL workflow to SDT workflow for xinlinx-linux drivers.

Good news is that rfdc library worked on 2024.1 baseline code; or at least RFDC code ran past initialization portion. Bad news RFCLK library got borked with new SDT workflow. I think Xilinx 2024 SDT-gen misinterpreted some parameter incorrectly and bricked petalinux with 2024.1 xilinx-linux drivers.

Your feedback is good; I just do not know if Xilinx has a better way dealing with their own parameter config generated by TCL or SDT that have to work with their firmware drivers (embeddedsw libraries). They spent over 2 years moving away from TCL and forcing developers to use the new SDT workflown, which AMD actively warning XSCT and TCL workflown deprecated after 2024.2 version.

For example, current version 2024.1 SDT-gen had this issue mismatched in parameter name and caused issue with XSA parameter generated by SDT:

WARNING: CONFIG.PSUPSS_REF_CLKFREQMHZ not found. Using default value for XPAR_PSU_PSS_REF_CLK_FREQ_HZ

If AMD/Xilinx code developers do not have a way to deal with incremental changes and messed up their parameter name conversion, the hardware (XSA) vs firmware (RFDC) will continue having issue after every major version upgrade.

Last major upgrade 2021 to 2022 was the worst as Yocto (base for petalinux) major changed and that broke quite a lot of Xilinx codes. That was the year RFDC struct calls changed as well, and I had to update my firmware codes to work with new RFDC structs. That was as bad as 2019-2020 major revision.

— Reply to this email directly, view it on GitHubhttps://github.com/Xilinx/embeddedsw/issues/252#issuecomment-2166587560, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FPBO2WJZEROEPB2ERUX4TZHHVJ5AVCNFSM6AAAAABJI2UE56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGU4DONJWGA. You are receiving this because you were mentioned.Message ID: @.***>

conallogriofaamd commented 5 months ago

Hi @gsmecher @NghiSpectrohm,

Apologies for the delay, I was on leave until now. Thanks for your comments/questions.

I think for the current RFDC driver we do not plan to move away from the param-list blob (although I agree it is not ideal). We can certainly look at doing this differently for any future generations of RFDC drivers.

@gsmecher I think the config struct in xrfdc.h is packed to match with the param-list blob.

Just a quick bit of info, Although when we use sdtgen to generate the device tree we get all the parameters listed n the hwh plus some generated for the baremetal, these will be stripped out for the Linux device tree by the lopper.

@NghiSpectrohm please let me know what issues you are seeing with rfclk so we can try solve any issues you are having.

Thanks, Conall.

NghiSpectrohm commented 5 months ago

Hi @conallogriofaamd

https://support.xilinx.com/s/question/0D54U00008TMuQ9SAL/20241-petalinux-issue-with-missing-i2c20-bridge-driver-for-clk104-runtime-configuration-over-pl-i2c-gpio-port?language=en_US

Here is the rfclk issue with 2024.1 version. I believed it's xilinx sdt/bsp issue and not xilinx-linux driver, as rfclk is expecting the i2c-20 bridge. For the 2024.1 SDT bsp for zcu-208 and zcu-111 do not build the i2c bridge, only i2c-0 and i2c-1 enabled. The xrfclk.c expected i2c-20 bridge enable and petalinux runs failed at initialization of clk104.

conallogriofaamd commented 5 months ago

Hi,

Thanks for your feedback, I did some debugging. We seem to have a delta in what we are seeing i get the following i2c devices: image I did notice there are some topology deltas, I had to change the GPIO number in the example to 533 and the i2c device to be i2c-0 in the rfclk code which gives me: image Is it possible to share your device tree so I can give it a look?

This is scping the rfclk example onto the prebuilt in the BSP.

Thanks, Conall.

NghiSpectrohm commented 5 months ago

Here is my XSA->sdtgen zip without bitstream. sdt_2024_zcu208.zip

With latest 2024 version, xrfclk firmware code updated for sdt workflow, the xrfclk changed the way how gpio assigned i2c port IDs. Previously in 2023 code, ZCU-208 i2c-1 for clk104 has gpio ID fixed as 486. For 2024, it required manual runtime to detect GPIO assignment for i2c ports.

For my 2024 zcu208 code, I ran petalinux and looked up clk104 i2c gpio base address to find linux gpio assignment; it is now assigned clk104 ic2-1 to ID=512. Since 2024 code failed, I reverted back to 2023.2 petalinux and used old GPIO ID=486 with xsa generated by vivado 2023.1, it works to me now.

Here is the appnote for 2024 version for xrfclk_esamples_app.c:

1.6 dc 01/19/24 Correct linux gpio ID The argument in XRFClk_Init is a gpioID. Note: gpioID must be detected manualy, first find a gpio base address in device tree (currently is 0xa0205000), second check which gpio device has this address alocated, eg.: xilinx-zcu216:~$ cat /sys/class/gpio/gpiochip481/label a0205000.gpio the gpio ID value is 481.

NghiSpectrohm commented 4 months ago

@conallogriofaamd

After reran sdt-gen to include board specific libary (using argument -board_dts zcu208-reva), I was able to get passed XRFClk_InitGPIO() and now the application failed at XRFClk_initSPI(). My code, I changed GPIO ID=512 while keeping i2c-1 as the spi default port.

 $ dtc -I fs /sys/firmware/devicetree/base 2> /dev/null | grep gpio_spi_mux
                        xlnx,name = "axi_gpio_spi_mux";
                axi_gpio_spi_mux = "/amba_pl/axi_gpio@b0060000";
$ grep b0060000 /sys/class/gpio/gpiochip*/label
/sys/class/gpio/gpiochip512/label:b0060000.axi_gpio

image

In your code run, you said you changed GPIO number to 533 and I2C to i2c-0. Are you sure that ZCU208 bsp runs Clk104 spi on i2c-0 instead of i2c-1?

i2c-1 is fixed for xrfclk.c for ZCU208 board.

/****************************************************************************/
/**
*
* This function is used to initialize RFCLK devices on i2c1-bus:
* i2c1 bus switch, i2c2spi bridge and MUX_SELx GPIOs.
*
* @param    none
*
* @return   GpioId gpio ID for Linux build, n/a for baremetal build.
*
* @note     None
*
****************************************************************************/
#if defined XPS_BOARD_ZCU111
u32 XRFClk_Init()
{
#elif defined __BAREMETAL__
u32 XRFClk_Init(u32 GpioMuxBaseAddress)
{
    XRFClk_GpioMuxBaseAddress = GpioMuxBaseAddress;
#else
u32 XRFClk_Init(int GpioId)
{
    sprintf(GPIO_MUX_SEL0, "%d", GpioId);
    sprintf(GPIO_MUX_SEL1, "%d", GpioId + 1);

#endif
    if (XST_FAILURE == XRFClk_InitI2C()) {
        LOG("i2c init");
        return XST_FAILURE;
    }
    if (XST_FAILURE == XRFClk_InitGPIO()) {
        LOG("gpio init");
        return XST_FAILURE;
    }
    if (XST_FAILURE == XRFClk_InitSPI()) {
        LOG("spi init");
        return XST_FAILURE;
    }
    return XST_SUCCESS;
}

How do you force xrfclk code to i2c-0 from user application space? xrfclk.c/h is embedded library from xilinx-linux, and there is no way to force XRFClk_Init(GpioId) to use i2c-0 for clk104 spi port.

conallogriofaamd commented 4 months ago

Hi @NghiSpectrohm,

Apologies for the delay, yes for the BSP I used (zcu208-sdt) I found that it was i2c-0, during my testing was to built the rfclk driver code on my workstation and SCP'd it over - (rather than building the whole kernel), I modified the i2c-1 device to i2c-0 in the xrfclk.c file in order to do this.

"Are you sure that ZCU208 bsp runs Clk104 spi on i2c-0 instead of i2c-1?" Sorry, I think my explanation was not good, The code in the SDT BSP is saying to use i2c-1 when it should be using i2c-0, however the classic flow BSP is still i2c-1. This should not be hard-coded in the xrfclk.c file.

I have discussed with @xlnx-dcvetic (owner of driver) about this issue & we agree that the hardcoding of the device is not ideal & I think he plans to remove this hard coding of the i2c device. @xlnx-dcvetic can you please comment?

Thanks

xlnx-dcvetic commented 4 months ago

Hi @NghiSpectrohm, as Conall said we will make i2c initialisation agnostic to the arbitrary index assigned to i2c bus by the linux boot. Regards

The plan has been changed, see https://github.com/Xilinx/embeddedsw/issues/252#issuecomment-2257996744

NghiSpectrohm commented 4 months ago

@xlnx-dcvetic

Shantmoses on Support Forum noted that SDT 2024.1 generated system-top.dts with. We're not sure which IP changes with SDT workflow that required I2C port swapped when ZCU has fixed I2C pin assigned and routed on the board. In the XCST workflow, the I2C ports aren't swapped. So this is a new bug impacted I2C assignment for XRFCLK module.

   aliases {

       serial0 = &uart0;

       spi0 = &qspi;

       serial1 = &coresight_0;

       i2c0 = &i2c1;

       i2c1 = &i2c0;

       ethernet0 = &gem3;

   };
xlnx-dcvetic commented 4 months ago

@NghiSpectrohm Acording to the group responsible for SDT, they are working on this issue and it will be fixed "the right way" in design. That is why I have edited comment https://github.com/Xilinx/embeddedsw/issues/252#issuecomment-2213613424