Xilinx / embeddedsw

Xilinx Embedded Software (embeddedsw) Development
Other
883 stars 1.05k forks source link

RFDC driver is incompatible with device-tree TCL #252

Open gsmecher opened 1 year ago

gsmecher commented 1 year ago

Problem 1: device tree mismatch in embeddedsw

The rfdc driver expects a device tree with most of the configuration tucked into a "param-list" entry, as can be seen here:

    Status = metal_linux_get_device_property(DevicePtr, XRFDC_CONFIG_DATA_PROPERTY,
                         &Data, XRFDC_DEVICE_ID_SIZE);

However, the rfdc.tcl script that lives alongside this driver code generates much richer device-tree entries, and no param-list entry. For example, here's the kind of entries I get when running this version of rfdc.tcl instead:

[...]
            xlnx,adc-bypass-bg-cal00 = "false";
            xlnx,adc-bypass-bg-cal01 = "false";
            xlnx,adc-bypass-bg-cal02 = "false";
            xlnx,adc-bypass-bg-cal03 = "false";
            xlnx,adc-bypass-bg-cal10 = "false";
            xlnx,adc-bypass-bg-cal11 = "false";
            xlnx,adc-bypass-bg-cal12 = "false";
            xlnx,adc-bypass-bg-cal13 = "false";
            xlnx,adc-bypass-bg-cal20 = "false";
            xlnx,adc-bypass-bg-cal21 = "false";
            xlnx,adc-bypass-bg-cal22 = "false";
            xlnx,adc-bypass-bg-cal23 = "false";
            xlnx,adc-bypass-bg-cal30 = "false";
            xlnx,adc-bypass-bg-cal31 = "false";
            xlnx,adc-bypass-bg-cal32 = "false";
            xlnx,adc-bypass-bg-cal33 = "false";
            xlnx,adc-calopt-mode00 = <0x1>;
            xlnx,adc-calopt-mode01 = <0x1>;
            xlnx,adc-calopt-mode02 = <0x1>;
            xlnx,adc-calopt-mode03 = <0x1>;
            xlnx,adc-calopt-mode10 = <0x1>;
            xlnx,adc-calopt-mode11 = <0x1>;
[...]

This is much better than a "param-list" blob, but unfortunately the drivers don't understand it. Since both drivers (rfdc_sinit.c) and generator (rfdc.tcl) live in the same repository, I'm confused why they don't work together. That's problem 1.

Problem 2: param-list mismatch

Let's try, instead, generating my param-list using the device-tree generator shipped in device-tree-xlnx. This gives me a param-list blob that looks like the following:

        param-list = [
            00 00 00 00 00 00 00 80 00 00 00 00 01 00 00 00
            00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00
            02 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
            00 00 00 00 00 00 14 40 00 00 00 00 00 00 69 40
            00 00 00 00 00 88 73 40 32 00 00 00 02 00 00 00
            01 00 00 00 00 00 00 00 00 00 00 00 00 00 1c 40
[...]

If I'm going to use the drivers from embeddedsw, this param-list must match the in-memory layout of the XRFdc_Config object. However, this structure includes a LinkCoupling field for each DAC tile:

    u32 Enable;
    u32 PLLEnable;
    double SamplingRate;
    double RefClkFreq;
    double FabClkFreq;
    u32 FeedbackDiv;
    u32 OutputDiv;
    u32 RefClkDiv;
    u32 MultibandConfig;
    double MaxSampleRate;
    u32 NumSlices;
    u32 LinkCoupling;                             /* THIS ONE HERE */
    XRFdc_DACBlock_AnalogDataPath_Config DACBlock_Analog_Config[4];
    XRFdc_DACBlock_DigitalDataPath_Config DACBlock_Digital_Config[4];
} XRFdc_DACTile_Config;

This field is not emitted by the rfdc.tcl captured in device-tree-xlnx, so every field after the first missing LinkCoupling entry is misaligned, the param-list array is unexpectedly short, and hilarity ensues.

Summary of my questions

  1. Why doesn't rfdc.tcl produce device-tree content that matches param-list expected in rfdc_sinit.c?
  2. Where am I supposed to find a matching rfdc.tcl and driver code? The combinations I'm working with clearly don't agree.

As always, thanks for your time and attention.

gsmecher commented 1 year ago

Aha - I think the "master" branch of device-tree-xlnx is stale, and I should be working from a version-specific branch. Is it possible to somehow shout this from the hilltops (or delete the "master" branch entirely)?

varalaxmi-bingi commented 1 year ago

hi @gsmecher yes the current master branch of device-tree is not in sync with latest changes. We are planning to sync that in 2023.2 release. For latest changes you can use our release branches.

Regards, Varalaxmi

gsmecher commented 2 weeks ago

@varalaxmi-bingi: Longer term, do you plan to move away from the param-list blob? This method of shipping configuration between TCL and C code is extraordinarily brittle: any time the struct definitions change, it's a rather painful process to diagnose and fix structure misalignment.

For example, the NCO field added between rfdc_v12_0 and rfdc_v12_1 broke device-tree generation again. The structs in xrfdc.h also aren't __attribute__((packed)), and I imagine there are hidden field alignment issues just waiting to pop up.

Thanks again for all your efforts.

NghiSpectrohm commented 2 weeks ago

@gsmecher With new Xilinx 2024.1 drop, and xilinx-linux driver of petalinux recommends SDT workflow that using sdt_gen to create project-specific device-tree folder from XSA file, instead of paramter.h.

Would the SDT workflow bypass this parameter.h and xilinx-linux embeddedsw mismatched? I think if any mismatch between embeddedsw drivers and SDT-output from Xilinx, we still have the same issue as TCL -> device-tree?

gsmecher commented 2 weeks ago

@NghiSpectrohm - thanks for the quick response. As I understand it, the challenge isn't where the device tree is generated - it's the fact that it contains a binary representation of a C struct, instead of (more conventional) named fields and values. Adding new fields in a serialized binary struct is not safe, especially without some kind of version-number or struct-size guard in the downstream code.

A SDT device-tree generator that produces param-list blobs is perhaps a little more convenient than the current xsa export -> xsct script process, but unless param-list goes away, it's not really more robust.

NghiSpectrohm commented 2 weeks ago

@gsmecher Thank you for the explanation; your argument makes sense. Base on Xilinx developer feedback on AMD forum, they are not changing parameter as input to xilinx embeddedsw libraries. Biggest change from 2023 to 2024 version is moving TCL workflow to SDT workflow for xinlinx-linux drivers.

Good news is that rfdc library worked on 2024.1 baseline code; or at least RFDC code ran past initialization portion. Bad news RFCLK library got borked with new SDT workflow. I think Xilinx 2024 SDT-gen misinterpreted some parameter incorrectly and bricked petalinux with 2024.1 xilinx-linux drivers.

Your feedback is good; I just do not know if Xilinx has a better way dealing with their own parameter config generated by TCL or SDT that have to work with their firmware drivers (embeddedsw libraries). They spent over 2 years moving away from TCL and forcing developers to use the new SDT workflown, which AMD actively warning XSCT and TCL workflown deprecated after 2024.2 version.

For example, current version 2024.1 SDT-gen had this issue mismatched in parameter name and caused issue with XSA parameter generated by SDT:

WARNING: CONFIG.PSUPSS_REF_CLKFREQMHZ not found. Using default value for XPAR_PSU_PSS_REF_CLK_FREQ_HZ

If AMD/Xilinx code developers do not have a way to deal with incremental changes and messed up their parameter name conversion, the hardware (XSA) vs firmware (RFDC) will continue having issue after every major version upgrade.

Last major upgrade 2021 to 2022 was the worst as Yocto (base for petalinux) major changed and that broke quite a lot of Xilinx codes. That was the year RFDC struct calls changed as well, and I had to update my firmware codes to work with new RFDC structs. That was as bad as 2019-2020 major revision.

varalaxmi-bingi commented 1 week ago

Hi @O'Griofa, @.***>, Could you please log your comments on the bugcase for the below customer reported issues as you’re the owner of the rfdc driver.

Regards, Varalaxmi From: NghiSpectrohm @.> Sent: Friday, June 14, 2024 12:43 AM To: Xilinx/embeddedsw @.> Cc: Bingi, Varalaxmi @.>; Mention @.> Subject: Re: [Xilinx/embeddedsw] RFDC driver is incompatible with device-tree TCL (Issue #252)

@gsmecherhttps://github.com/gsmecher Thank you for the explanation; your argument makes sense. Base on Xilinx developer feedback on AMD forum, they are not changing parameter as input to xilinx embeddedsw libraries. Biggest change from 2023 to 2024 version is moving TCL workflow to SDT workflow for xinlinx-linux drivers.

Good news is that rfdc library worked on 2024.1 baseline code; or at least RFDC code ran past initialization portion. Bad news RFCLK library got borked with new SDT workflow. I think Xilinx 2024 SDT-gen misinterpreted some parameter incorrectly and bricked petalinux with 2024.1 xilinx-linux drivers.

Your feedback is good; I just do not know if Xilinx has a better way dealing with their own parameter config generated by TCL or SDT that have to work with their firmware drivers (embeddedsw libraries). They spent over 2 years moving away from TCL and forcing developers to use the new SDT workflown, which AMD actively warning XSCT and TCL workflown deprecated after 2024.2 version.

For example, current version 2024.1 SDT-gen had this issue mismatched in parameter name and caused issue with XSA parameter generated by SDT:

WARNING: CONFIG.PSUPSS_REF_CLKFREQMHZ not found. Using default value for XPAR_PSU_PSS_REF_CLK_FREQ_HZ

If AMD/Xilinx code developers do not have a way to deal with incremental changes and messed up their parameter name conversion, the hardware (XSA) vs firmware (RFDC) will continue having issue after every major version upgrade.

Last major upgrade 2021 to 2022 was the worst as Yocto (base for petalinux) major changed and that broke quite a lot of Xilinx codes. That was the year RFDC struct calls changed as well, and I had to update my firmware codes to work with new RFDC structs. That was as bad as 2019-2020 major revision.

— Reply to this email directly, view it on GitHubhttps://github.com/Xilinx/embeddedsw/issues/252#issuecomment-2166587560, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3FPBO2WJZEROEPB2ERUX4TZHHVJ5AVCNFSM6AAAAABJI2UE56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGU4DONJWGA. You are receiving this because you were mentioned.Message ID: @.***>

conallogriofaamd commented 1 week ago

Hi @gsmecher @NghiSpectrohm,

Apologies for the delay, I was on leave until now. Thanks for your comments/questions.

I think for the current RFDC driver we do not plan to move away from the param-list blob (although I agree it is not ideal). We can certainly look at doing this differently for any future generations of RFDC drivers.

@gsmecher I think the config struct in xrfdc.h is packed to match with the param-list blob.

Just a quick bit of info, Although when we use sdtgen to generate the device tree we get all the parameters listed n the hwh plus some generated for the baremetal, these will be stripped out for the Linux device tree by the lopper.

@NghiSpectrohm please let me know what issues you are seeing with rfclk so we can try solve any issues you are having.

Thanks, Conall.

NghiSpectrohm commented 4 days ago

Hi @conallogriofaamd

https://support.xilinx.com/s/question/0D54U00008TMuQ9SAL/20241-petalinux-issue-with-missing-i2c20-bridge-driver-for-clk104-runtime-configuration-over-pl-i2c-gpio-port?language=en_US

Here is the rfclk issue with 2024.1 version. I believed it's xilinx sdt/bsp issue and not xilinx-linux driver, as rfclk is expecting the i2c-20 bridge. For the 2024.1 SDT bsp for zcu-208 and zcu-111 do not build the i2c bridge, only i2c-0 and i2c-1 enabled. The xrfclk.c expected i2c-20 bridge enable and petalinux runs failed at initialization of clk104.

conallogriofaamd commented 2 days ago

Hi,

Thanks for your feedback, I did some debugging. We seem to have a delta in what we are seeing i get the following i2c devices: image I did notice there are some topology deltas, I had to change the GPIO number in the example to 533 and the i2c device to be i2c-0 in the rfclk code which gives me: image Is it possible to share your device tree so I can give it a look?

This is scping the rfclk example onto the prebuilt in the BSP.

Thanks, Conall.

NghiSpectrohm commented 2 days ago

Here is my XSA->sdtgen zip without bitstream. sdt_2024_zcu208.zip

With latest 2024 version, xrfclk firmware code updated for sdt workflow, the xrfclk changed the way how gpio assigned i2c port IDs. Previously in 2023 code, ZCU-208 i2c-1 for clk104 has gpio ID fixed as 486. For 2024, it required manual runtime to detect GPIO assignment for i2c ports.

For my 2024 zcu208 code, I ran petalinux and looked up clk104 i2c gpio base address to find linux gpio assignment; it is now assigned clk104 ic2-1 to ID=512. Since 2024 code failed, I reverted back to 2023.2 petalinux and used old GPIO ID=486 with xsa generated by vivado 2023.1, it works to me now.

Here is the appnote for 2024 version for xrfclk_esamples_app.c:

1.6 dc 01/19/24 Correct linux gpio ID The argument in XRFClk_Init is a gpioID. Note: gpioID must be detected manualy, first find a gpio base address in device tree (currently is 0xa0205000), second check which gpio device has this address alocated, eg.: xilinx-zcu216:~$ cat /sys/class/gpio/gpiochip481/label a0205000.gpio the gpio ID value is 481.