conda-forge / xilinx-runtime-feedstock

A conda-smithy repository for xilinx-runtime.
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

How does a user know if conda xilinx-runtime is compatiblle with the system components? #4

Open timsnyder opened 2 years ago

timsnyder commented 2 years ago

Comment:

We only package the userspace bits in the conda package. How does someone know that they have the correct version of the system stuff installed?

Should they:

  1. look at the output of xbutil examine (using xbutil from the conda package) and freak out if the SHA's don't match
  2. look at xbutil examine and only care that the versions match
  3. run xbutil validate and if that passes using the conda version of xbutil and the system version, then they know everything should be okay?
  4. More extensive testing?

I haven't seen documentation that describes in detail how the userspace portions of XRT expect to be able to talk to the system portions. I'm sure it is documented (either by XRT or just standard Linux kernel module docs).

@bryanloz-xilinx , I'm just opening this ticket so that we have something to point users to when someone eventually ends up with a version of XRT in their conda environment that doesn't play nicely with the system package they have installed.

bryanloz-xilinx commented 2 years ago

@timsnyder this is a very good issue for tracking this. Thanks for raising it.

Historically, older XRT drivers work okay with newer XRT user space libraries for a time. We know this, because in Vitis-AI, we build docker containers with XRT user space libraries, and they seem to work pretty well, when running on systems with outdated XRT drivers.

However, when problems arise, it becomes necessary to align the system XRT with the XRT inside the docker container... or in our case here... the XRT inside of our conda virtual environment.

So, it may become necessary for people to be aware that this packaged user space version of XRT is the latest release 2.13.466.

Then they should verify that the XRT installed to the system is matching. Today, the system XRT is always installed to /opt/xilinx/xrt cringe...

So someone can do /opt/xilinx/xrt/bin/xbutil examine

In the future, I hope to learn if it is possible to package kernel stuff as well in Anaconda...

timsnyder commented 2 years ago

I think if they do $CONDA_PREFIX/bin/xbutil examine, it will show the version+SHA of the conda stuff as well as the version+SHA for XOCL and XCLMGMT running on the system. We could probably add an activation script to this recipe that warns the user about the need to upgrade system or downgrade conda package if PKG_VER > XOCL.

However, I know there are cases where XCLMGMT won't be running, not sure about XOCL. So, maybe it would be better to compare the output of /opt/xilinx/xrt/bin/xbutil examine against $CONDA_PREFIX/bin/xbutil examine?

bryanloz-xilinx commented 2 years ago

What if the user installs the conda package first, and doesn't have the drivers? Then we can't automatically perform the check. I think its best to throughly document this. Is there a common place where known issues can be tracked in conda feedstocks? I guess here?

P.S. you are right about xbutil examine, here is sample output from my test machine:

$ conda activate xrt
$ xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 3.10.0-1160.el7.x86_64
  Version              : #1 SMP Tue Aug 18 14:50:17 EDT 2020
  Machine              : x86_64
  CPU Cores            : 12
  Memory               : 46556 MB
  Distribution         : Red Hat Enterprise Linux Workstation 7.9 (Maipo)
  GLIBC                : 2.17
  Model                : PowerEdge R740

XRT
  Version              : 2.13.466
  Branch               : HEAD
  Hash                 : 0cbaaa84350936a8e05f8b30f0a14aae2a871181
  Hash Date            : 2022-07-07 17:13:56
  XOCL                 : 2.13.466, f5505e402c2ca1ffe45eb6d3a9399b23a0dc8776
  XCLMGMT              : 2.13.466, f5505e402c2ca1ffe45eb6d3a9399b23a0dc8776
timsnyder commented 2 years ago

If the system drivers aren't running, then the XOCL and/or XCLMGMT lines will say unknown. Here's what it looks like on a host without an Alveo board on it:

[azureuser@tims-manager scripts]$ xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 3.10.0-1160.45.1.el7.x86_64
  Version              : #1 SMP Wed Oct 13 17:20:51 UTC 2021
  Machine              : x86_64
  CPU Cores            : 8
  Memory               : 32011 MB
  Distribution         : CentOS Linux 7 (Core)
  GLIBC                : 2.17
  Model                : Virtual Machine

XRT
  Version              : 2.11.680
  Branch               : 2021.1
  Hash                 : 47ba9cb43c1dd7fcdf742837caaf4a32b0170862
  Hash Date            : 2021-09-27 23:06:26
  XOCL                 : unknown, unknown
  XCLMGMT              : unknown, unknown

Devices present
  0 devices found

And on a cloud VM system that has only XOCL supposed to be running:

-bash-4.2$ xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 3.10.0-1160.45.1.el7.x86_64
  Version              : #1 SMP Wed Oct 13 17:20:51 UTC 2021
  Machine              : x86_64
  CPU Cores            : 10
  Memory               : 169090 MB
  Distribution         : CentOS Linux 7 (Core)
  GLIBC                : 2.17
  Model                : Virtual Machine

XRT
  Version              : 2.11.0
  Branch               : HEAD
  Hash                 : 3a5eccaa8c1b5a372dcb6eed2023eb7e69718a6e
  Hash Date            : 2022-06-08 14:14:24
  XOCL                 : 2.11.0, 3a5eccaa8c1b5a372dcb6eed2023eb7e69718a6e
  XCLMGMT              : unknown, unknown

Devices present
  [0001:00:00.0] : xilinx_u250_gen3x16_xdma_shell_2_1 
bryanloz-xilinx commented 2 years ago

I see, okay, then I guess what you suggest is quite feasible, and we can plan for that.

Here is a start (not sure if we can count on such utilities being present all the time though):

$ xbutil examine | grep XOCL | awk '{print $3}'
2.13.466,