Open timsnyder opened 2 years ago
@timsnyder this is a very good issue for tracking this. Thanks for raising it.
Historically, older XRT drivers work okay with newer XRT user space libraries for a time. We know this, because in Vitis-AI, we build docker containers with XRT user space libraries, and they seem to work pretty well, when running on systems with outdated XRT drivers.
However, when problems arise, it becomes necessary to align the system XRT with the XRT inside the docker container... or in our case here... the XRT inside of our conda virtual environment.
So, it may become necessary for people to be aware that this packaged user space version of XRT is the latest release 2.13.466.
Then they should verify that the XRT installed to the system is matching.
Today, the system XRT is always installed to /opt/xilinx/xrt
cringe...
So someone can do /opt/xilinx/xrt/bin/xbutil examine
In the future, I hope to learn if it is possible to package kernel stuff as well in Anaconda...
I think if they do $CONDA_PREFIX/bin/xbutil examine
, it will show the version+SHA of the conda stuff as well as the version+SHA for XOCL and XCLMGMT running on the system. We could probably add an activation script to this recipe that warns the user about the need to upgrade system or downgrade conda package if PKG_VER > XOCL
.
However, I know there are cases where XCLMGMT won't be running, not sure about XOCL. So, maybe it would be better to compare the output of /opt/xilinx/xrt/bin/xbutil examine
against $CONDA_PREFIX/bin/xbutil examine
?
What if the user installs the conda package first, and doesn't have the drivers? Then we can't automatically perform the check. I think its best to throughly document this. Is there a common place where known issues can be tracked in conda feedstocks? I guess here?
P.S. you are right about xbutil examine, here is sample output from my test machine:
$ conda activate xrt
$ xbutil examine
System Configuration
OS Name : Linux
Release : 3.10.0-1160.el7.x86_64
Version : #1 SMP Tue Aug 18 14:50:17 EDT 2020
Machine : x86_64
CPU Cores : 12
Memory : 46556 MB
Distribution : Red Hat Enterprise Linux Workstation 7.9 (Maipo)
GLIBC : 2.17
Model : PowerEdge R740
XRT
Version : 2.13.466
Branch : HEAD
Hash : 0cbaaa84350936a8e05f8b30f0a14aae2a871181
Hash Date : 2022-07-07 17:13:56
XOCL : 2.13.466, f5505e402c2ca1ffe45eb6d3a9399b23a0dc8776
XCLMGMT : 2.13.466, f5505e402c2ca1ffe45eb6d3a9399b23a0dc8776
If the system drivers aren't running, then the XOCL and/or XCLMGMT lines will say unknown
. Here's what it looks like on a host without an Alveo board on it:
[azureuser@tims-manager scripts]$ xbutil examine
System Configuration
OS Name : Linux
Release : 3.10.0-1160.45.1.el7.x86_64
Version : #1 SMP Wed Oct 13 17:20:51 UTC 2021
Machine : x86_64
CPU Cores : 8
Memory : 32011 MB
Distribution : CentOS Linux 7 (Core)
GLIBC : 2.17
Model : Virtual Machine
XRT
Version : 2.11.680
Branch : 2021.1
Hash : 47ba9cb43c1dd7fcdf742837caaf4a32b0170862
Hash Date : 2021-09-27 23:06:26
XOCL : unknown, unknown
XCLMGMT : unknown, unknown
Devices present
0 devices found
And on a cloud VM system that has only XOCL supposed to be running:
-bash-4.2$ xbutil examine
System Configuration
OS Name : Linux
Release : 3.10.0-1160.45.1.el7.x86_64
Version : #1 SMP Wed Oct 13 17:20:51 UTC 2021
Machine : x86_64
CPU Cores : 10
Memory : 169090 MB
Distribution : CentOS Linux 7 (Core)
GLIBC : 2.17
Model : Virtual Machine
XRT
Version : 2.11.0
Branch : HEAD
Hash : 3a5eccaa8c1b5a372dcb6eed2023eb7e69718a6e
Hash Date : 2022-06-08 14:14:24
XOCL : 2.11.0, 3a5eccaa8c1b5a372dcb6eed2023eb7e69718a6e
XCLMGMT : unknown, unknown
Devices present
[0001:00:00.0] : xilinx_u250_gen3x16_xdma_shell_2_1
I see, okay, then I guess what you suggest is quite feasible, and we can plan for that.
Here is a start (not sure if we can count on such utilities being present all the time though):
$ xbutil examine | grep XOCL | awk '{print $3}'
2.13.466,
Comment:
We only package the userspace bits in the conda package. How does someone know that they have the correct version of the system stuff installed?
Should they:
xbutil examine
(usingxbutil
from the conda package) and freak out if the SHA's don't matchxbutil examine
and only care that the versions matchxbutil validate
and if that passes using the conda version of xbutil and the system version, then they know everything should be okay?I haven't seen documentation that describes in detail how the userspace portions of XRT expect to be able to talk to the system portions. I'm sure it is documented (either by XRT or just standard Linux kernel module docs).
@bryanloz-xilinx , I'm just opening this ticket so that we have something to point users to when someone eventually ends up with a version of XRT in their conda environment that doesn't play nicely with the system package they have installed.