Xilinx / mlir-aie

An MLIR-based toolchain for AMD AI Engine-enabled devices.
Other
260 stars 76 forks source link

Support on setting up the environment and using MLIR-AIE on VCK5000 #1591

Open jbelot opened 6 days ago

jbelot commented 6 days ago

Hi,

This is my first issue and I don't know if this is the right place to post this since it's not really a specific problem. It's more of a request for documentation/clarification on how to install the environment for the vck5000 platform and how to use it. This issue is related to these 3 other : https://github.com/Xilinx/mlir-aie/issues/77#, https://github.com/Xilinx/mlir-aie/issues/359 and https://github.com/Xilinx/mlir-air/issues/387.

My goal is to run a minimal example of MLIR-AIE generated code (potentially interfacing with the PL) on a VCK5000 platform. I was able to follow several tutorials except those requiring me to run the code on the platform (such as tutorial 2c).

Now in order to run it on VCK5000, I have to follow the building instructions. I have to admit that the installation is pretty painful as the instructions refer to several other pages or git repositories, which doesn't make the whole thing crystal clear and multiplies the possibilities of error.

First there seem to be a typo in the building instructions in the following script.

git clone https://github.com/stephenneuendorffer/aie-rt
cd aie-rt
git checkout phoenix_v2023.2
cd driver/src
make -f Makefile.Linux CFLAGS="-D__AIEAMDAIR__"
sudo cp -r ../include /opt/aiengine/
sudo cp libxaiengine.so* /opt/xaiengine/lib/
export LD_LIBRARY_PATH=/opt/xaiengine/lib:${LD_LIBRARY_PATH}

There is a mention of /opt/aiengine and /opt/xaiengine, shouldn't it be the same directory? After there is also a mention of /opt/xaienginev2, does it refer to the same directory also?

Are there any plans to simplify / clarify the building instructions for the VCK5000?

Moreover, the prerequisites for this installation may seem a bit old: the generation of the ROCm AIR platform requires a version of Vitis 2022.1 and its AMD drivers require a linux kernel version of 5.11. My linux kernel version is higher (6.5.0) and does not seem to fit anymore.

Is there now an easier way to target the VCK5000 platform for MLIR AIE rather than having to install ROCm then ROCr then the platform and associated drivers in order to avoid obsolete dependencies? Or are there any plans to update the drivers?

Assuming I manage to install the environment for the vck5000 platform, is there a minimal example such as those in the tutorials (potentially interfacing with the PL) that I could run?

Thank you for helping a soul in distress!

eddierichter-amd commented 5 days ago

Hi,

Thanks for the feedback and apologies for this being frustrating. You brought up a couple issues so going to tackle them one by one:

On the note of versions, unfortunately we are still on those versions of the kernel and Vitis. One thing that I will note is Vitis 2022.1 is only needed if you are building the FPGA platform, if you are simply using mlir-aie to program the AIEs, you can (and actually have to) use Vitis 2023.2. I am hoping to bump our FPGA platform as well to make the versioning consistent but for now I would recommend just using 2023.2. We are also still using that kernel version but will keep you updated on when we upgrade.

On the note of documentation, thanks for catching that typo. This issue inspired me to clean up some of the VCK5000 build script and documentation. I have the following PR: https://github.com/Xilinx/mlir-aie/pull/1597 this hopefully makes it slightly easier to build as it will automatically build aie-rt and the experimental ROCm runtime. One thing I do want to note is you are still going to need to install a global version of ROCm 5.6 as well as build and insert the driver. There isn't a great way around either of those.

jbelot commented 4 days ago

Hi,

Thank you for your answer :)

To recap, I managed to install the global version of ROCm 5.6, so assuming that I only want to program the AIEs in the first instance, my main issue would be to build the drivers that requests an older linux kernel than I have. I guess the linux kernel version is the source of the error that I get when running a make in the driver directory:

make
/usr/src/linux-headers-`uname -r` M=$PWD
make[1]: Entering directory '/usr/src/linux-headers-6.5.0-41-generic'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  You are using:           gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  CC [M]  ..../mlir-aie/ROCm-air-platforms/driver/amdair_chardev.o
In file included from ./include/linux/linkage.h:7,
                 from ./arch/x86/include/asm/cache.h:5,
                 from ./include/linux/cache.h:6,
                 from ./include/linux/time.h:5,
                 from ./include/linux/compat.h:10,
                 from ..../mlir-aie/ROCm-air-platforms/driver/amdair_chardev.c:4:
..../mlir-aie/ROCm-air-platforms/driver/amdair_chardev.c: In function ‘amdair_chardev_init’:
./include/linux/export.h:29:22: error: passing argument 1 of ‘class_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
   29 | #define THIS_MODULE (&__this_module)
      |                     ~^~~~~~~~~~~~~~~
      |                      |
      |                      struct module *

I tried to fix it manually by changing this line from:

    amdair_class = class_create(THIS_MODULE, amdair_dev_name());

into:

    amdair_class = class_create(amdair_dev_name());

The compilation succeed, but then, when I load the driver, it does not seem to work as the device file /dev/amdair is not created.

If the problem does indeed come from my version of linux kernel, then I'm kind of stuck until the drivers are updated, right?

But maybe this is more an issue for the ROCm-air-platforms repo than for here, tell me if I should create an issue in there.

Last question, assuming all this has been sorted out, will I be able to run the tutorials on the VCK5000 without any problems (e.g. tutorial 2c), or will I have to adapt it?

eddierichter-amd commented 5 hours ago

Do you see any errors in dmesg? I haven't tried the driver with a more recent Linux kernel but have been wanting to try it out. I am guessing there are a couple more changes that are required. I agree it would be good to continue the conversation regarding the driver in the platform repo.

The tutorials were written assuming you have access to a Ryzen AI platform. We have put some work into the compiler to provide a similar aie2.py programming experience to the tutorials, but there are some differences such as in the size of the array, the availability of DMAs in each column (The Ryzen AI platform has DMAs in every column whereas the VCK5000 just has DMAs in the following columns https://github.com/Xilinx/ROCm-air-platforms/blob/main/firmware/main.cpp#L75-L76 -- Which should definitely be documented, thanks for pointing that out!) and the host code is different. I would take a look at https://github.com/Xilinx/mlir-aie/tree/main/programming_examples/basic/vector_vector_add which has both a test.cpp which runs on the VCK5000 and a test_vck5000.cpp which runs on the VCK5000.

jbelot commented 2 hours ago

Thanks for your answers, this is clearer to me now.

The sudo dmesg | grep amdair command does not give any output. I have just created another issue in the platform repo, so we can discuss about it in there.

Thanks for pointing out the vector_vector_add example, it will be useful to get start with once I have managed to install the platform!

I guess we can close this issue as the matter is more on the platform repo side, and that you merge the PR to update the documentation.

eddierichter-amd commented 41 minutes ago

Sounds good. I will follow up in the issue on the platform repo. Thanks for creating that!