HSAFoundation / HSA-Drivers-Linux-AMD

These drivers have been superseded by ROCm Platform now hosted at Radeon Open Compute GitHub Repo
https://github.com/RadeonOpenCompute
Other
61 stars 15 forks source link

Dgpu support for HSA ? #19

Closed LifeIsStrange closed 8 years ago

LifeIsStrange commented 8 years ago

Hello, Just a curious question from an end user and AMD shareholder , are incoming dgpu polaris + am4 + zen will support HSA ? Or this will be reserved only for apus/ARM socs ?

I have seen this comment in phoronix.com by an AMD dev (Bridgman)

"CI is the first generation with support for user queues, HW scheduling and AQL, but there's a limit on MEC microcode store size so at the moment we can't fit support for PM4 (what graphics uses), AQL (what HSA uses) and HW scheduling (what HSA also uses) in a single image. Carrizo has two MEC blocks so we were able to configure one for AQL+HWS and the other for AQL+PM4, but IIRC the dGPUs only have a single MEC block so that approach won't work there.

VI doesn't have those limits, and it also adds the ability to interrupt the execution of long-running shader threads and roll the context out to memory so you can context-switch even if individual shaders are running for seconds or minutes. We call that Compute Wave Save/Restore (CWSR) and the latest KFD release (part of the Boltzmann stack under GPUOpen) includes CWSR support for Carrizo.

I'm tempted to add support for Hawaii even if it means we don't have HW scheduling (although I need to check if PCIE atomics work on it) -- if so then Bonaire support would more-or-less come along for the ride -- but going back to SI gets hard because it doesn't have any of the uncore support required for HSA. Ask me again in a month if it makes sense to define an HSA subset that could run on SI, I should have an answer by then.

For hardware earlier than SI the ISA changes completely so probably doesn't make sense at all." So VI and up GCN arch support HSA, but they fully support HSA ? Or a subset ? And also , have you performance/efficiency number of HSA against openCL/SYCL ? And finally, can you give me some new use cases that HSA can do and OpenCL not ? For exemple HSA will be useful for browser ? Sorry to take your time and sorry for my bad english.

briansp2020 commented 8 years ago

First of all, I'm just another AMD user and share holder like you. So, take this as an observation from another guy.

You should check out http://gpuopen.com/professional-compute/ The latest release is all about dGPU support. Fiji (Fury nano) seems to be the focus at the moment. You can get the latest HSA kernel driver/runtime library from https://github.com/RadeonOpenCompute. It will be a while before you see HSA support for consumer application such as a browser or anything that runs on your desktop. At the moment, AMD seems to be focusing on HPC applications and deep learning frameworks.

johnbridgman commented 8 years ago

Hi LifeIsStrange,

The post you quoted was in response to someone asking about the possibility of support on OLDER dGPUs. The developer preview stack we released as part of GPUOpen is aimed at the Fury dGPU, although it also includes code paths for the Tonga (R9 285/380/380X) dGPU. I expect the same stacksupport will be carried forward to future dGPUs, although it's too early to make specific commitments about future products.

The main difference between running the stack on a dGPU vs an APU is that the APU provides cache coherency between GPU and CPU memory accesses when accessing "fine-grained shared memory". On a dGPU we provide similar functionality by allocating a fine-grained buffer in system memory, but with lower performance because the GPU is not able to fully cache the memory. On the other hand dGPUs are still able to allocate and use coarse-grained memory on both device and host, which take full advantage of the Fury's fast HBM.

In terms of Boltzmann/HSA vs OpenCL, keep in mind that OpenCL is just another language that can be run over HSA, and OpenCL can run either over HSA or over a graphics driver stack, so there are really two different parts to your question :

In a typical OpenCL implementation every new dispatch of a compute kernel requires a round-trip through the kernel driver, while the Boltzmann stack allows userspace programs to submit work to, and receive results from the GPU without having to call down into the kernel drivers.

Anyways, short answer "yes the stack supports dGPU already".

johnbridgman commented 8 years ago

BTW the driver & runtime code are now being published in a new set of repositories. As bsp2020 said, the best thing is to go to http://gpuopen.com/professional-compute/ and explore the different components. Starting from the bottom, at minimum you want to look at ROCK, ROCR, HCCompiler and HIP.

ROCK and ROCR are extended versions of the HSA drivers & runtime, while HCCompiler and HIP are new. There are also a number of other components described, including a variety of tools & libraries which run on top of the Boltzmann stack.

LifeIsStrange commented 8 years ago

@briansp2020 Firstly thanks a lot for your links ! You say "It will be a while before you see HSA support for consumer application such as a browser or anything that runs on your desktop. At the moment, AMD seems to be focusing on HPC applications and deep learning frameworks." Why ? Maybe that they will not contribute to desktop applications but That's not a reason, developpers from arround the world could use HSA and from what I understund current openCL software can easily be ported to HSA (and AMD contribute a lot on OpenCL desktop softs) . Secondly all next ARM socs will support HSA so this will become a standard on the smartphone side.

LifeIsStrange commented 8 years ago

@johnbridgman Thanks a lot for your amazing answer, this is a lot of great news !

Will HSA be supported on all cpus or only zen/ARM ?

You say "The main difference between running the stack on a dGPU vs an APU is that the APU provides cache coherency between GPU and CPU memory accesses when accessing "fine-grained shared memory". On a dGPU we provide similar functionality by allocating a fine-grained buffer in system memory, but with lower performance because the GPU is not able to fully cache the memory."

Ok but I have opened this issue: https://github.com/HSAFoundation/HSA-Runtime-AMD/issues/16 Which ask if HSA will support Heterogeneous Memory Management, this could help this dgpu lack ?

And sorry to again take your time but could you open on github an "ideas repo" where people arround the world could give some suggestions/ideas to AMD ? This could help the innovation for free !

Some exemples of ideas : https://github.com/VerticalResearchGroup/miaow Reuse this work for polaris ?

Recontribute to Bullet (the physics engine) for using OpenCL/SYCL (which will give a true competitor against physx and perf better on AMD gpus)

Migrate the gpuopen effects from direct3d to Vulkan, and heavily push his use on consoles (and so on on pc, because games are ported from consoles to pc), and maybe push HSA use on consoles ?

Develop async compute and openCL/SYCL use on servo which is the browser engine of the future (https://github.com/glennw/webrender) (if AMD is Faster on browsers, this could be a game changer)

Create a shadowplay equivalent

http://wccftech.com/intel-rumored-to-lose-one-of-its-biggest-processor-clients-can-you-guess-who/ Try to Negociate with Google to use Seattle or K12

Do more communications, for exemple promoting the fact that AMD gpus live longer than nvidia gpus. Communicate more of async compute shader.

Affirming that you have no backdoors, denoncing the nvidia anti competitive practices, etc

Maybe push trueaudio on audacity/ardour

gstoner commented 8 years ago

For AMD all discussion on product direction and futures for shareholders please contact investor relations. This is for technical questions & issues related to products on Github

Thank you.

Gregory Stoner Sr Director Radeon Open Compute