AICAN-Research / FAST-Pathology

⚡ Open-source software for deep learning-based digital pathology
BSD 2-Clause "Simplified" License
121 stars 24 forks source link

OpenCL exception on Mac #85

Closed cavenel closed 1 year ago

cavenel commented 1 year ago

Hi,

Not sure if it is an issue with FASTPathology or a problem with the installation of openCL, but I have a user that got these error message after following all the installation steps on Mac OS: image

And the terminal shows:

    OK
    parsing
    OK
    Done
    ERROR [0x7000030d5000] Build log, device 0
    <program source>:13:3: error: expected expression
            } else {
      ^

    ERROR [0x7000030d5000] Program build failure
    stopping pipeline
    done
    removing renderers..
    done
    stopping pipeline
    done
    removing renderers..
    done

Is there any way to check if OpenCL is installed properly on MacOS?

andreped commented 1 year ago

Not sure what is causing this, but @smistad can likely aid you further.

Assuming that OpenCL is properly setup, it could be that you have the same issue as I did on my macbook. It is an older macbook (non-silicon). On my macbook I have two GPUs: one AMD GPU and one integrated Intel GPU.

For whatever reason, when using the AMD GPU I got a similar error as you got above. After switching to use the Intel GPU, the problem was resolved.

By default, the best GPU will be selected by the OS. In order to force the OS to use the Intel GPU, which should be compatible with FP, I used a program called gfxCardStatus. After installing it, you can choose to toggle between GPUs from the top bar. "Dynamic switching" should be enabled by default, but if you set "Integrated only" it should use the integrated GPU which should be compatible with FP. See example below. Note that you cannot have a monitor connected to the macbook when using the "Integrated only" option.

BTW: If I remember correctly, I was able to render the WSI using the AMD GPU, but running any analysis, even tissue segmentation, resulted in the same error as you had. Are you able to render the WSI?

Screenshot 2023-08-29 at 11 36 59
cavenel commented 1 year ago

I was now able to reproduce on a Macbook Pro (Quad-Core Intel Core i7 / Intel Iris Pro 1536 MB) I first had the exact same error message than above, with macOS 12.6.5. The error occures when trying to run a pipeline.

I then updated the macOS version to 12.6.8 and now I get a different error, when trying to load any image: ERROR [0x10a68a600] OpenCL exception caught in Qt event handler clCreateProgramWithBinary(Invalid value)

image

I tried the gfxCardStatus idea, but I only have one GPU apparently.

Do you know how to make sure OpenCL is correctly installed? I haven't done any specific installation for it.

Thanks!

andreped commented 1 year ago

I did not have the same issue on my macbook, also running Monterey (v12 macOS).

As this seems OpenCL-related, @smistad can likely assist you further.

Also, I assume you get the same for all pipelines, including the simple tissue segmenter?

cavenel commented 1 year ago

Yes I can confirm that (when I could add images) I had the issue with all pipelines.

andreped commented 1 year ago

Not sure I can be of much help with OpenCL-issues, but what do you get when you run clinfo in the terminal? You might need to install it first by brew install clinfo

cavenel commented 1 year ago

Here is the output of clinfo:

Number of platforms                               1
  Platform Name                                   Apple
  Platform Vendor                                 Apple
  Platform Version                                OpenCL 1.2 (Aug  8 2022 21:29:33)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

  Platform Name                                   Apple
Number of devices                                 2
  Device Name                                     Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  Device Vendor                                   Intel
  Device Vendor ID                                0xffffffff
  Device Version                                  OpenCL 1.2 
  Driver Version                                  1.1
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               8
  Max clock frequency                             2200MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1x1
  Max work group size                             1024
  Preferred work group size multiple (kernel)     1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              17179869184 (16GiB)
  Error Correction support                        No
  Max memory allocation                           4294967296 (4GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        64
  Global Memory cache line size                   6291456 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   1 bytes
    Pitch alignment for 2D image buffers          1 pixels
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4096 (4KiB)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

  Device Name                                     Iris Pro
  Device Vendor                                   Intel
  Device Vendor ID                                0x1024500
  Device Version                                  OpenCL 1.2 
  Driver Version                                  1.2(Jul  6 2023 23:52:47)
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               40
  Max clock frequency                             1200MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Preferred work group size multiple (kernel)     32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              1610612736 (1.5GiB)
  Error Correction support                        No
  Max memory allocation                           402653184 (384MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            25165824 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          32 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Apple
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [P0]
  clCreateContext(NULL, ...) [default]            Success [P0]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Iris Pro
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Iris Pro
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (2)
    Platform Name                                 Apple
    Device Name                                   Iris Pro
    Device Name                                   Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
cavenel commented 1 year ago

I removed all FAST and fastpathology files from my home folder and restarted a new project. Now I am back to the original error.

It seems as if the error always comes after dilation:

INFO [0x7000050af000] Added renderer SegmentationRenderer with id segRenderer
OK
Done
INFO [0x7000053c1000] EXECUTING EmptyProcessObject because PO is modified.
INFO [0x7000053c1000] EXECUTING EmptyProcessObject because PO is modified.
INFO [0x7000053c1000] EXECUTING TissueSegmentation because PO is modified.
INFO [0x7000053c1000] EXECUTING EmptyProcessObject because PO is modified.
INFO [0x7000053c1000] EXECUTING ImageChannelConverter because PO is modified.
INFO [0x7000053c1000] EXECUTING EmptyProcessObject because PO is modified.
INFO [0x7000053c1000] EXECUTING Dilation because PO is modified.
ERROR [0x7000053c1000] Build log, device 0
<program source>:13:3: error: expected expression
        } else {
  ^

ERROR [0x7000053c1000] Program build failure
stopping pipeline
done
removing renderers..
done
stopping pipeline
done
removing renderers..
done

But I don't see any } else { in the dilation.cl file of FAST.

andreped commented 1 year ago

Just ran clinfo on my own macbook where everything works, and I was unable to see relevant differences, other than the fact that you have an Intel Iris integrated GPU, whereas I have an Intel HD Graphics 530.

I also tried to reinstall FP using the very latest release, but I could not reproduce this issue. Do you observe the same if you use the CMU-1.svs from the OpenSlide test data suite?

Also could you try running fastpathology with verbose enabled, unless that was already the case, like so:

/Applications/FastPathology.app/Contents/MacOS/bin/fastpathology --verbose
cavenel commented 1 year ago

Thanks for spending some time on this! It also fails on CMU-1.svs directly loaded from fastpathology. The last output was indeed with verbose on.

andreped commented 1 year ago

Very strange. I'm have no idea what is causing this. Debugging is also challenging as I cannot reproduce the issue myself, but it is apparent that something is not working, as you have observed the same bug on two separate machines.

I will assign @smistad to this issue. He is likely off for the weekend, but may reply early next week.

andreped commented 1 year ago

Lastly, could you check that glxgears and xeyes render properly? You might need to brew install glxgears.

andreped commented 1 year ago

I just ran a small test where I deleted both the FAST/ and FastPathology/ directories created by FastPathology and reran the test, and by accident I had forgotten to set "Integrated only", which resulted in me managing to reproduce the bug you reported (<program source>:13:3: error: expected expression (...)).

After forcing the OS to use the integrated GPU instead, it worked fine. So I do not think there is necessarily a FAST-bug, but rather that for whatever reason your CPU-GPU setup is not working as it should. Perhaps there could be an OpenCL-OpenGL interop issue.

Could you show me the verbose you get from launching FastPathology from the terminal (with --verbose of course). There might be some info there that could help me in understanding the issue.

cavenel commented 1 year ago

Unfortunately I am also out for the weekend with no access to the Mac computer. Will test on Monday! Thanks again

smistad commented 1 year ago

Hi

OpenCL is installed correctly, apple has their own implementation which is always installed. The error comes from apple's opencl failing to compile some OpenCL code. Since other opencl implementations are able to compile this, it points to a bug in apple's code (not the first time...).

I think it fails to compile erosion.cl https://github.com/smistad/FAST/blob/master/source/FAST/Algorithms/Morphology/Erosion.cl

Also I think this is the same issue we have seen on AMD Macs.

This can probably be resolved by rewriting the code it fails on. You can try editing the erosion.cl file yourself. It will try to recompile every time you run it. To fix it myself I need access to a Mac with this issue.

cavenel commented 1 year ago

Hi @smistad, Thanks, after some testing, I think it actually fails to compile ImageFill.cl https://github.com/smistad/FAST/blob/master/source/FAST/ImageFill.cl

Apparently, Mac OS doesn't like having } else { on one line. Making it in two lines fixed this issue:

}
else {

Don't ask me why 🤷

I will do some more testing to see if something else needs fixing.

smistad commented 1 year ago

The Apple OpenCL compiler is not completely stable.. There are some } else { statements in Erosion.cl as well which is used in TissueSegmentation. Does it not have issues with this?

Let us know if you find more of these compile issues and we can add the fixes to FAST

cavenel commented 1 year ago

Once fixing the missing new line in ImageFill.cl, I didn't get any other compilation errors. I tested a couple of pipelines, and they worked well. But I am not sure they actually ran the erosion.cl kernel. Would you have a minimal pipeline example running erosion?

(Edit: I see you wrote that TissueSegmentation uses Erosion.cl, but I had no issue with it. So probably means it's only in ImageFill.cl for some reason.)

cavenel commented 1 year ago

Actually the space between } and else in ImageFill.cl is not a simple space, but character u+00a0:

image

That explains why MacOS is failing.

smistad commented 1 year ago

Doh! :facepalm:

I added a fix for it to FAST just now: https://github.com/smistad/FAST/commit/167f5cb6b3118fd73169817f0a0650bf654837b0 Will rebuild FastPathology with the fix soon.

Thanks for figuring this out @cavenel

@andreped Could you test if this was the cause for crashes on AMD Mac's as well?

smistad commented 1 year ago

This build should contain the fix: https://github.com/AICAN-Research/FAST-Pathology/actions/runs/6072324757

andreped commented 1 year ago

@andreped Could you test if this was the cause for crashes on AMD Mac's as well?

Works wonders on both AMD GPU and dedicated Intel GPU now, @smistad :] Great job, and good catch, @cavenel!

Scale bar is also working fine!

Screenshot 2023-09-04 at 12 56 15
andreped commented 1 year ago

@cavenel Could you verify that the latest release of FP (v1.1.2) works out-of-the-box on your macbook? https://github.com/AICAN-Research/FAST-Pathology/releases/tag/v1.1.2

cavenel commented 1 year ago

Yes I can confirm in now works out-of-the-box with version 1.1.2! Thanks for the fast fix! Will you also add the arm64 asset?

smistad commented 1 year ago

Yes, I will compile it for arm64 as well, but I have to do it manually still since github doesn't offer arm64 macOS runners yet

smistad commented 1 year ago

@cavenel I just uploaded an macos arm64 package to the release. I haven't been able to test it my self, so let me know if it doesn't work for some reason.

cavenel commented 1 year ago

Not sure yet. I never tested on M1 before, but now I get stuck at launch, it can not find libomp.dynlib:

dyld[26049]: Library not loaded: /opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib
  Referenced from: <2780B264-9867-3136-9A51-CEE6AE3D1F61> /Applications/FastPathology.app/Contents/MacOS/lib/libFAST.4.7.1.dylib
  Reason: tried: '/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/usr/local/lib/libomp.dylib' (no such file), '/usr/lib/libomp.dylib' (no such file, not in dyld cache)
Abort trap: 6

I did install libomp from homebrew:

libomp 16.0.6 is already installed and up-to-date.

and libomp.dylib is in /opt/homebrew/opt/libomp/lib/:

bash-3.2$ ls -al /opt/homebrew/opt/libomp/lib/
total 3144
drwxr-xr-x  4 cavenel  admin     128 Sep  7 11:54 .
drwxr-xr-x  6 cavenel  admin     192 Sep  7 11:54 ..
-r--r--r--  1 cavenel  admin  880472 Jun 11 00:58 libomp.a
-r--r--r--  1 cavenel  admin  726480 Sep  7 11:54 libomp.dylib

I wonder if that's because fastpathology is looking for v14.0.6 specifically and homebrew installed 16.0.6 instead.

andreped commented 1 year ago

Not sure yet. I never tested on M1 before, but now I get stuck at launch, it can not find libomp.dynlib:

@cavenel This is related to issue https://github.com/AICAN-Research/FAST-Pathology/issues/73

Basically, we do not yet bundle these dependencies as part of FastPathology. These are installed separately, and thus if these gets updated, then FP might no longer work.

@smistad should known which versions of these dependencies you should install, and then reinstalling (downgrading/upgrading) should resolve the issue. There are likely other deps that could have issues, but for this specific dependency, you could try running:

brew install libomp@14.0.6

If this exact formula is not available, try another version 14.x.y.

cavenel commented 1 year ago

Unfortunately, libomp 16.0.6 seems to be the only available version in brew for arm64. But it is unrelated to this specific issue here, so we can probably move to #73!

smistad commented 1 year ago

Yes, brew is very annyoing; suddenly updating packages, and then not supporting installation of older versions. But, the plan is to package all the dependencies on macOS to avoid having to use brew at all. This is what we do on windows and ubuntu, and it is on my todo list for macOS.

smistad commented 1 year ago

Unfortunately, libomp 16.0.6 seems to be the only available version in brew for arm64. But it is unrelated to this specific issue here, so we can probably move to #73!

The arm64 build I uploaded was built with libomp 16.0.6. The x86_64 version build is with v14 of libomp.

andreped commented 1 year ago

The arm64 build I uploaded was built with libomp 16.0.6. The x86_64 version build is with v14 of libomp.

But then the error message @cavenel gets for M1 doesn't make sense, as it looks like FP expects libomp v14.

dyld[26049]: Library not loaded: /opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib
  Referenced from: <2780B264-9867-3136-9A51-CEE6AE3D1F61> /Applications/FastPathology.app/Contents/MacOS/lib/libFAST.4.7.1.dylib
  Reason: tried: '/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp@14.0.6/lib/libomp.dylib' (no such file), '/usr/local/lib/libomp.dylib' (no such file), '/usr/lib/libomp.dylib' (no such file, not in dyld cache)
Abort trap: 6
smistad commented 1 year ago

You are right, my bad its version 14... We need to start bundling these dependencies to get out of this dependency hell on mac

cavenel commented 1 year ago

I fixed it for now with:

ln -s /opt/homebrew/opt/libomp /opt/homebrew/opt/libomp@14.0.6

Which is kind of ugly and might crash if the two versions differ too much. But at least fastpathology is starting, and I was able to load models and images from tests, and run pipelines (segment nuclei and tissue). So I would say that the fix for OpenCL is ok!

cavenel commented 1 year ago

PS: I have an other project using openslide and qt6, and I would really like to have an installer for Mac as we already have one for windows. So if you manage to bundle all dependencies and make a dmg out of it that would be really interesting for me too :-D