META-DREAMER / Qwik-e-Classifier

Realtime Object Classification using Custom AI Hardware Accelerator
MIT License
5 stars 2 forks source link

OpenCL driver + webcam driver + hdmi #30

Closed nocduro closed 6 years ago

nocduro commented 6 years ago

I've been trying to get both of these drivers to work on Linux kernel version 4.15 from https://github.com/altera-opensource/linux-socfpga/tree/socfpga-4.15

I've gotten the kernel compiled and the acl_drv.ko compiled which is successfully loaded by insmod acl_drv.ko. After setting env variables for the 17.1 rte from Intel an aocl diagnose command passes.

However, when running an opencl program like the vector_add example from intel, the program freezes like it is unable to communicate with the fpga??

I've tried modifying the .dts (device tree source, it tells the kernel where the devices are in memory like cpu, memory, ethernet, timers, etc.) file with not much luck

sudo apt install device-tree-compiler
dtc -I dts -O dtb -o socfpga.dtb [filename].dts

One of the other things is that the Linux kernel is kinda in the middle of adding a fpga manager, a generic interface which has changed from kernel version 3.x to 4.x. Intel doesn't seem to have up-to-date tooling/guides for the newer Linux kernels (?)

Possible fixes

nocduro commented 6 years ago

Currently modifying a .dts file to follow: https://github.com/altera-opensource/linux-socfpga/blob/master/Documentation/devicetree/bindings/fpga/altera-hps2fpga-bridge.txt

Using register values from: https://www.altera.com/hps/cyclone-v/hps.html#topic/sfo1418687413697.html

edit: didn't work. trying 3.x kernel now

nocduro commented 6 years ago

Looks like Intel doesn't distribute software with known exploits, so they nuked all of the 3.x kernels from their repo because of meltdown / spectre exploits... Luckily thinkoco has a fork in their repository that is still up.

I thought I almost got the 4.15 kernel to work, but all of my OpenCL programs 'freeze' when running. The weird thing is that aocl diagnose works perfectly! All it does is just check that it can make OpenCL buffers and read write to the OpenCL device. The buffers work, but the OpenCL kernel's (different from linux kernel, they are basically functions) do not work? They just sit there and never return.

The way FPGAs are handled in Linux changed at some point, so the reprogram binary provided with the intel rte tools is unable to disable/re-enable the fpga bridges in 4.x. Bridges are how the hps communicates to the fpga, and their are 4 'built-in' bridges:

And these are listed under /sys/class/fpga_bridge/[br0 | br1 | br2 | br3]

There is also /sys/class/fpga_manager/fpga0:

$ cat /sys/class/fpga_manager/fpga0/name
Altera SOCFPGA FPGA Manager
$ cat /sys/class/fpga_manager/fpga0/state
operating

Here's some of the output from my test program when I don't set CL_CONTEXT_COMPILER_MODE_INTELFPGA env variable:

root@DE10_NANO:~/ocl# ./opencl triv2.aocx
binary file len is: 2602196
platform name: Intel(R) FPGA SDK for OpenCL(TM)
        Device(DeviceId(0xb54752a8))
platform found! Platform(PlatformId(0xb5475260)): "Intel(R) FPGA SDK for OpenCL(TM)"
Reprogramming device [0] with handle 1
sh: 1: cannot create /sys/class/fpga-bridge/fpga2hps/enable: Directory nonexistent
sh: 1: cannot create /sys/class/fpga-bridge/hps2fpga/enable: Directory nonexistent
sh: 1: cannot create /sys/class/fpga-bridge/lwhps2fpga/enable: Directory nonexistent
Couldn't open FPGA status from /sys/class/fpga/fpga0/status!
sh: 1: cannot create /sys/class/fpga-bridge/fpga2hps/enable: Directory nonexistent
sh: 1: cannot create /sys/class/fpga-bridge/hps2fpga/enable: Directory nonexistent
sh: 1: cannot create /sys/class/fpga-bridge/lwhps2fpga/enable: Directory nonexistent
Reprogram FAILED
mmd program_device:  Board reprogram failed
mem_rd_que: Device(DeviceId(0xb54752a8)) - OpenclVersion { ver: [1, 0] }

MMD FATAL: acl_mmd.cpp:59: can't find handle -1 -- aborting
opencl: acl_mmd.cpp:59: ACL_MMD_DEVICE* get_mmd_device(int): Assertion `0' failed.

With CL_CONTEXT_COMPILER_MODE_INTELFPGA=1:

root@DE10_NANO:~/ocl# ./opencl triv2.aocx
binary file len is: 2602196
platform name: Intel(R) FPGA SDK for OpenCL(TM)
        Device(DeviceId(0xb54752a8))
platform found! Platform(PlatformId(0xb5475260)): "Intel(R) FPGA SDK for OpenCL(TM)"
mem_rd_que: Device(DeviceId(0xb54752a8)) - OpenclVersion { ver: [1, 0] }
buffer created
kernel created
kernel sent to opencl

And it just sits there.

With CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 (I think I have the correct .rbf file for this mode loaded):

root@DE10_NANO:~/ocl# ./opencl triv2.aocx
binary file len is: 2602196
platform name: Intel(R) FPGA SDK for OpenCL(TM)
        Device(DeviceId(0xb56752a8))
platform found! Platform(PlatformId(0xb5675260)): "Intel(R) FPGA SDK for OpenCL(TM)"
mem_rd_que: Device(DeviceId(0xb56752a8)) - OpenclVersion { ver: [1, 0] }
buffer created
kernel created
kernel sent to opencl

same as with =1 😕

aocl diagnose output:

root@DE10_NANO:~/ocl# aocl diagnose

Verified that the kernel mode driver is installed on the host machine.

Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Board vendor name: Intel(R) Corporation
Board name: de10_nano_sharedonly_hdmi : Cyclone V SoC Development Kit

Buffer read/write test passed.

DIAGNOSTIC_PASSED

Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices

clinfo output:

root@DE10_NANO:~/ocl# clinfo
Number of platforms                               1
  Platform Name                                   Intel(R) FPGA SDK for OpenCL(TM)
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.1
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cles_khr_int64 cl_intelfpga_live_object_tracking cl_intelfpga_compiler_mode cl_khr_icd cl_khr_3d_image_writes
  Platform Extensions function suffix             IntelFPGA

  Platform Name                                   Intel(R) FPGA SDK for OpenCL(TM)
Number of devices                                 1
  Device Name                                     de10_nano_sharedonly_hdmi : Cyclone V SoC Development Kit
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x1172
  Device Version                                  OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.1
  Driver Version                                  17.1
  Device OpenCL C Version                         OpenCL C 1.0
  Device Type                                     Accelerator
  Device Profile                                  EMBEDDED_PROFILE
  Max compute units                               1
  Max clock frequency                             1000MHz
  Max work item dimensions                        3
  Max work item sizes                             2147483647x2147483647x2147483647
  Max work group size                             2147483647
  Preferred work group size multiple              <getWGsizes:518: create kernel : error -46>
  Preferred / native vector sizes
    char                                                 4 / 4
    short                                                2 / 2
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              536870912 (512MiB)
  Error Correction support                        No
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             1024 bytes
  Alignment of base address                       8192 bits (1024 bytes)
  Global Memory cache type                        Read-Only
  Global Memory cache size                        <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
  Global Memory cache line                        0 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Local
  Local memory size                               16384 (16KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     256
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              No
  Device Extensions                               cl_khr_byte_addressable_store cles_khr_int64 cl_intelfpga_live_object_tracking cl_intelfpga_compiler_mode cl_khr_icd cl_khr_3d_image_writes

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) FPGA SDK for OpenCL(TM)
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [IntelFPGA]
  clCreateContext(NULL, ...) [default]            Success [IntelFPGA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  Success (1)
    Platform Name                                 Intel(R) FPGA SDK for OpenCL(TM)
    Device Name                                   de10_nano_sharedonly_hdmi : Cyclone V SoC Development Kit
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) FPGA SDK for OpenCL(TM)
    Device Name                                   de10_nano_sharedonly_hdmi : Cyclone V SoC Development Kit

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
        NOTE:   your OpenCL library declares to support OpenCL 1.2,
                but it seems to support up to OpenCL 2.1 too.

uname -a output:

root@DE10_NANO:~/ocl# uname -a
Linux DE10_NANO 4.15.0 #1 SMP Sun Mar 25 15:46:34 DST 2018 armv7l armv7l armv7l GNU/Linux

Boot sequence full output: https://gist.github.com/nocduro/87928861657963c7ddb3f46f4bbc4255

Interesting part (line 176):

[    1.103077] fpga_manager fpga0: Altera SOCFPGA FPGA Manager registered
[    1.110185] altera_hps2fpga_bridge sopc@0:fpgabridge@0: fpga bridge [hps2fpga] registered
[    1.118596] altera_hps2fpga_bridge sopc@0:fpgabridge@1: fpga bridge [lwhps2fpga] registered
[    1.127101] altera_hps2fpga_bridge sopc@0:fpgabridge@2: fpga bridge [fpga2hps] registered
[    1.135622] altera_fpga2sdram_bridge sopc@0:fpgabridge@3: fpga bridge [fpga2sdram] registered
[    1.144138] altera_fpga2sdram_bridge sopc@0:fpgabridge@3: driver initialized with handoff 000001ff

I'm a bit lost on what to do with the 4.x stuff. We are given the source for reprogram.c in the rte folder, do I fix it to work with the fpga manager? Is something setup wrong in the device tree preventing the Linux kernel communicating with the OpenCL kernel on the FPGA? Why does a zImage build of 4.9.78 hange at the Starting kernel ... part?

META-DREAMER commented 6 years ago

@nocduro Great work on dissecting this all. It almost seems like we are falling deeper into the rabbit hole right now. Do you see any signs of hope with all this? The thinkoco guy seemed to know what was up when it came to getting HDMI + OpenCL working, maybe he can give some more insight for webcam stuff as well?

nocduro commented 6 years ago

@hammadj I think there is a chance to get the 4.x linux kernel to work, but we'd probably have to build everything from source with the yocto/bitbake stuff. At this point it's probably easier just to get the webcam drivers installed on that image from thinkoco.

I would like to try the full yocto build with the 4.9.78-ltsi kernel from here which was updated about a week ago, this way we could write an app note on doing that. It would also mean we are up to date on the "official" supported version of linux for the Cyclone V stuff.

If I don't get that to work tonight, I'll just install the webcam drivers on the thinkoco 3.x kernel, hopefully that won't be too hard.

nocduro commented 6 years ago

Gave up on 4.x for now. Now running thinkoco's 3.18 Linux image, with my test program running. Now trying to get PS3 eye drivers installed, looks like it uses the ov534 chipset/driver

edit: this image wasn't compiled with the gspca drivers

$ root@DE10_NANO:~# cat /lib/modules/4.5.0-00185-g3bb556b/modules.builtin | grep media
kernel/drivers/media/usb/uvc/uvcvideo.ko
kernel/drivers/media/v4l2-core/videodev.ko
kernel/drivers/media/v4l2-core/v4l2-common.ko
kernel/drivers/media/v4l2-core/v4l2-dv-timings.ko
kernel/drivers/media/v4l2-core/videobuf2-core.ko
kernel/drivers/media/v4l2-core/videobuf2-v4l2.ko
kernel/drivers/media/v4l2-core/videobuf2-memops.ko
kernel/drivers/media/v4l2-core/videobuf2-vmalloc.ko

I've compiled just the drivers by cloning the 3.18 repo, running make menuconfig to select the required media drivers/usb support/ov534 support by following these sites:

http://wiki.tekkotsu.org/index.php/Sony_PlayStation_Eye_driver_install_instructions https://askubuntu.com/questions/168279/how-do-i-build-a-single-in-tree-kernel-module

I've copied the drivers over, but the Linux image seems to be looking in the wrong place potentially. I'll go to the lab tomorrow to try with the camera.

nocduro commented 6 years ago

Hmm, getting a segfault when loading the ov534 driver :(

META-DREAMER commented 6 years ago

Is there another camera we could use that would make the process easier?

nocduro commented 6 years ago

No, all the usb camera drivers are in the same folder, so it should be the same difficulty for any of them:

https://github.com/altera-opensource/linux-socfpga/tree/socfpga-4.9.78-ltsi/drivers/media/usb/gspca

Thinkoco had some recommendations on how to fix, I'll do that tonight. (last night I tried, but messed up in the kernel configuration somewhere)

nocduro commented 6 years ago

I've compiled a new 3.18 kernel with a modified .config file based off of the config_opencl_de10_nano from https://github.com/thinkoco/linux-socfpga/tree/socfpga-3.18

I've replaced the original zImage provided by thinkoco, and have tested on the board. It boots and runs my openCL test program. Just need to test with a webcam.

steps to reproduce (extended from here):

modified kernel config file

# done on a fresh install of ubuntu 16.04 on a 24 vCPU instance
sudo apt update
sudo apt install u-boot-tools gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libncurses5-dev make lsb uml-utilities git

# initial git clone is ~1.4 GB for everything
# could checkout single branch if wanted 
# git clone --single-branch -b socfpga-opencl_3.18 https://github.com/thinkoco/linux-socfpga.git
git clone https://github.com/thinkoco/linux-socfpga.git
cd linux-socfpga
git checkout -b socfpga-opencl_3.18 origin/socfpga-3.18
# copy my modified config (I disable localversion appending, and enabled some usb webcams)
# available from gist above (will add to repo later)
cp 3.18_usbcam_config .config

# setup compiler options
export ARCH=arm
export CROSS_COMPILE=arm-linux-gnueabihf-
export LOADADDR=0x8000
export LOCALVERSION=

# make with 24 threads
make -j24 zImage

# copy the compiled image
cp arch/arm/boot/zImage ~/3.18_usbcam_zImage
nocduro commented 6 years ago
[    2.132814] usbcore: registered new interface driver ov534
[    2.138341] usbcore: registered new interface driver ov534_9

Got this message during boot, so it looks like everything should work! Still have to test with an actual webcam though

thinkoco commented 5 years ago

@nocduro Hi, About the new linux kernel,Here may be a solution.