BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
1.97k stars 439 forks source link

make Mali GPUs usable (Odroid XU4) #1686

Open sirzooro opened 7 years ago

sirzooro commented 7 years ago

I have Odroid XU4, which has OpenCL-capable GPU. It is detected by BOINC client:

5           2016-10-24 21:09:05 OpenCL: Mali-T628 0: Mali-T628 (driver version 1.2, device version OpenCL 1.2 v1.r9p0-05rel0.816303d14b549c8bed2bad5983436ff4, 1991MB, 1991MB available, 3 GFLOPS peak) 
6           2016-10-24 21:09:05 OpenCL: Mali-T628 0: Mali-T628 (driver version 1.2, device version OpenCL 1.2 v1.r9p0-05rel0.816303d14b549c8bed2bad5983436ff4, 1991MB, 1991MB available, 1 GFLOPS peak) 

I also see following entries in schedrequest*.xml files:

    <coprocs>
<coproc>
   <type>Mali-T628</type>
   <count>1</count>
   <req_secs>0.000000</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <coproc_opencl>
      <name>Mali-T628</name>
      <vendor>ARM</vendor>
      <vendor_id>102760464</vendor_id>
      <available>1</available>
      <half_fp_config>63</half_fp_config>
      <single_fp_config>63</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_khr_image2d_from_buffer cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory</extensions>
      <global_mem_size>2086998016</global_mem_size>
      <local_mem_size>32768</local_mem_size>
      <max_clock_frequency>600</max_clock_frequency>
      <max_compute_units>4</max_compute_units>
      <nv_compute_capability_major>0</nv_compute_capability_major>
      <nv_compute_capability_minor>0</nv_compute_capability_minor>
      <amd_simd_per_compute_unit>0</amd_simd_per_compute_unit>
      <amd_simd_width>0</amd_simd_width>
      <amd_simd_instruction_width>0</amd_simd_instruction_width>
      <opencl_platform_version>OpenCL 1.2 v1.r14p0-01rel0.0fe2d25ca074016740f8ab3fb451b151</opencl_platform_version>
      <opencl_device_version>OpenCL 1.2 v1.r14p0-01rel0.0fe2d25ca074016740f8ab3fb451b151</opencl_device_version>
      <opencl_driver_version>1.2</opencl_driver_version>
   </coproc_opencl>
</coproc>
<coproc>
   <type>Mali-T628</type>
   <count>1</count>
   <req_secs>0.000000</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <coproc_opencl>
      <name>Mali-T628</name>
      <vendor>ARM</vendor>
      <vendor_id>102760464</vendor_id>
      <available>1</available>
      <half_fp_config>63</half_fp_config>
      <single_fp_config>63</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_khr_image2d_from_buffer cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory</extensions>
      <global_mem_size>2086998016</global_mem_size>
      <local_mem_size>32768</local_mem_size>
      <max_clock_frequency>600</max_clock_frequency>
      <max_compute_units>2</max_compute_units>
      <nv_compute_capability_major>0</nv_compute_capability_major>
      <nv_compute_capability_minor>0</nv_compute_capability_minor>
      <amd_simd_per_compute_unit>0</amd_simd_per_compute_unit>
      <amd_simd_width>0</amd_simd_width>
      <amd_simd_instruction_width>0</amd_simd_instruction_width>
      <opencl_platform_version>OpenCL 1.2 v1.r14p0-01rel0.0fe2d25ca074016740f8ab3fb451b151</opencl_platform_version>
      <opencl_device_version>OpenCL 1.2 v1.r14p0-01rel0.0fe2d25ca074016740f8ab3fb451b151</opencl_device_version>
      <opencl_driver_version>1.2</opencl_driver_version>
   </coproc_opencl>
</coproc>
    </coprocs>

However this is not displayed on Computers page on BOINC project's websites. Please fix this.

davidpanderson commented 7 years ago

Is the device attached to SETI@home? Please do so and send me the host ID.

sirzooro commented 7 years ago

Here you are: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8122195 Here is another one, with newer software: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8122211

Probably more changes will be needed, I have found that there is no dedicated plan class for this GPU - at least I do not see it here: https://boinc.berkeley.edu/trac/wiki/AppPlan

ChristianBeer commented 7 years ago

Is there a science application that utilizes Mali type GPUs? According to the Mali T-600 Developer Guide one needs to retune existing OpenCL application when using Mali GPUs. There is also the distinction of core groups with the Mali GPU. Every core group has it's own L2 Cache and can have up to 4 cores (or compute_units). See: https://community.arm.com/thread/8050 And every core group is recognized as a discrete GPU by BOINC.

sirzooro commented 7 years ago

I was not able to find one on Internet. Maybe there are some which runs standalone (without BOINC), I did not search for such ones.

Yesterday I was able to compile MilkyWay@home for this GPU, however so far I was not able to convince BOINC to give it some work. Today evening I will continue my work, I hope I will be able to make it work.

ChristianBeer commented 7 years ago

You should be able to get work through the anonymous platform feature of BOINC. It's possible that you need to specify the <app_version> in your app_info.xml with something like:

      <coproc>
          <type>Mali-T628</type>
          <count>1</count>
      </coproc>
sirzooro commented 7 years ago

I already tried this but it did not work - I suspect that MilkyWay only checks if GPU type is Nvidia or AMD. Now I am going to pretend that my GPU is an Nvidia one, or if this would not work too I will configure app as an ordinary CPU app. MilkyWay has CPU app version too, so I should be able to get some work this way.

davidpanderson commented 7 years ago

BTW, I added support for showing the OpenCL GPU on the web page for the host. This will require projects to upgrade their server code. I'll deploy it on SETI@home in a couple of days. -- David

On 10/25/2016 7:31 AM, sirzooro wrote:

I already tried this but it did not work - I suspect that MilkyWay checks if GPU type is Nvidia or AMD only. Now I am goind to pretend that my GPU is an Nvidia one, or if this would not work too I will configure app as an ordinary CPU app. MilkyWay has CPU app version too, so I should be able to get some work this way.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/1686#issuecomment-256051716, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8KgVluDj8D1bT3EOiJVgusScJxb8sMks5q3hLJgaJpZM4KfK8e.

sirzooro commented 7 years ago

Thanks! It should help promoting this new GPU across BOINC projects which already uses other GPUs.

sirzooro commented 7 years ago

Any update on this? I checked my computers list at S@H and it still does not display GPUs for these Odroids.

davidpanderson commented 7 years ago

Oops! I forgot to deploy scheduler modes on S@h. Will be done by tomorrow. -- D

On 12/28/2016 10:00 AM, sirzooro wrote:

Any update on this? I checked my computers list at S@H and it still does not display GPUs for these Odroids.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/1686#issuecomment-269514722, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8KgevdvGwagHW0P2FWk2Df0IgpzBnrks5rMqOrgaJpZM4KfK8e.

sirzooro commented 7 years ago

Thanks! I see that is is displayed now:

OpenCL GPU Mali-T628 (1990MB) driver: 1.02
OpenCL GPU Mali-T628 (1990MB) driver: 1.02

I wonder if these two entries should stay as-is, or maybe collapse them into one with [2] prefix like you do for Nvidia GPUs?

Please also add OpenCL version, it is displayed for my Nvidia cards. This would be useful for project owners to determine which OpenCL version is supported by majority of GPUs on various platforms.

sirzooro commented 6 years ago

Something is wrong with generic support for OpenCL devices. I was able to run Xansons on this GPU. However BOINC uses only one of GPUs, while it should use both of them. in cc_config.xml set to 1 did not help. I also tried to stop all CPU tasks to make sure that they do not get into way, but this did not work too. I tested this on Boinc 7.6.31. Looks that something is wrong when there are two OpenCL GPUs with the same names but different parameters.

sirzooro commented 6 years ago

One more issue: it looks that server ignores both configured jobs limit per GPU, and work buffer size reported by client in scheduler request. As a result server sends way too many WUs to client, see link below. This may be related to issue reported by me here few days ago, both server and client probably think that 2nd instance of Mali GPU is idle and try to give it something to work. However this may explain why client asks for more work, but server still should enforce jobs limit. http://xansons4cod.com/xansons4cod/forum_thread.php?id=81&postid=525#525

Edit: linked post mentions another issue, peak_flops for GPU on server is set to 0 instead of real value. This should be fixed too.

Edit2: another scenario is possible: client requests tasks for 2nd Mali and server thinks the same, but WUs sent ends up in 1st Mali queue. This may lead to endless fetch loop.

I also recalled one more issue: public stats pages on server shows stats for Nvidia, AMD and Intel GPUs only. There are no page for Mali, and other vendors who makes GPU chips for ARM. For now all of them could be presented on one new page "Other GPUs". Later in the future they may get separate per-vendor pages as needed.

sirzooro commented 6 years ago

I have noticed small cosmetic issue - on task list app name is displayed as "MilkyWay@Home Anonymous platform (CPU)". It should be GPU. I did not check if it is displayed properly for Nvidia/AMD/Intel GPU apps installed via anonymous platform.

The same name is displayed in WU details. I also saw that Device peak FLOPS for WU is 0.00 GFLOPS.

sirzooro commented 6 years ago

I have noticed one thing. When project has app_info.xml which configures app for Mali-T628, client requests tasks for one instance of GPU. When project does not use app_info.xml, client requests tasks for both GPU instances. Here are example lines from event log:

65  XANSONS for COD 2017-10-18 01:16:58 Requesting new tasks for Mali-T628  
69  yoyo@home   2017-10-18 01:17:04 Requesting new tasks for Mali-T628 and Mali-T628    

This case probably should be handled in similar way as when computer has two different GPUs from the same vendor (e.g. Nvidia 970 and 1070).

AenBleidd commented 5 years ago

@sirzooro, is this still an issue?

DummyPayload commented 4 years ago

is this issiue solved ? or there is any progress about it ?

I own some Mali equipped devices (with Rockchip RK3399, it includes Mali-T860MP4/Mali-T864) i get opencl to work on this boards (NanoPi M4 and M4v2 in my case) but i don't get any work from any of ARM projects (exept of yoyo and rakesearch which i did't tested). The gpu is detected, boinctui messeges tells that it's searching for WU's for both CPU and GPU for GPU supporting projects, but it never gives jobs for GPU.

Have anyone got it to work on non NV/AMD/Intel GPU?

AenBleidd commented 4 years ago

@markonmoto2, there is no project that has application for GPU on ARM devices.

adamradocz commented 4 years ago

I think we should close this issue, as there is no issue. The Client detects Mali GPUs perfectly.

sirzooro commented 4 years ago

Detection works properly, but there were other issues. See my comments here starting from https://github.com/BOINC/boinc/issues/1686#issuecomment-330078458 . Most important one was that BOINC detected two GPUs, but used only one.

At this moment I do not run GPU tasks on Odroid, but I may run some extra tests if they will be necessary.

cristipurdel commented 4 years ago

If they run on odroid, it would be nice to post how did you manage, in order to replicate it also here

sirzooro commented 4 years ago

Here are details what I did to run MilkyWay on GPU on Odroid, please take a look: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4180

I see that last year someone suggested to use 1 param. This may explain why BOINC used only one GPU instance instead of both, it needs to be confirmed.

cristipurdel commented 4 years ago

@sirzooro Could you make a detailed description of how you were abled to run on the gpu, so that hopefully boinc can also incorporate your setup to eventually run it also on a android phone?

sirzooro commented 4 years ago

OK. I do not remember all details, but it was something like this:

Here is full app_info.xml which I used:

<app_info>
  <app>
    <name>milkyway</name>
    <user_friendly_name>MilkyWay@Home</user_friendly_name>
    <non_cpu_intensive>0</non_cpu_intensive>
  </app>
  <file>
    <name>milkyway_separation</name>
    <executable/>
  </file>
  <app_version>
    <app_name>milkyway</app_name>
    <version_num>146</version_num>
    <platform>arm-unknown-linux-gnueabihf</platform>
    <avg_ncpus>0.0300000</avg_ncpus>
    <max_ncpus>0.0300000</max_ncpus>
    <plan_class>opencl_nvidia_101</plan_class>
    <api_version>7.6.33</api_version>
    <file_ref>
      <file_name>milkyway_separation</file_name>
      <main_program/>
    </file_ref>
    <coproc>
      <type>Mali-T628</type>
      <count>1.0</count>
    </coproc>
    <gpu_ram>268435456.000000</gpu_ram>
    <dont_throttle/>
  </app_version>
</app_info>
AenBleidd commented 8 months ago

There were several changes in the client since then. This issue need to be verified with the new release.