AvtechScientific / ASL

Advanced Simulation Library - hardware accelerated multiphysics simulation platform.
http://asl.org.il
GNU Affero General Public License v3.0
217 stars 55 forks source link

--device parameter on the command line not recognized #34

Open cyclaero opened 7 years ago

cyclaero commented 7 years ago

I am testing ASL on a Mac mini late 2014 with Intel Iris graphics hardware running macOS 10.12.6.

./asl-flow --devices outputs:

flow 1.0

Default computation device:
platform = Apple�
device = Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�

List of all available platforms and their devices:
Platform: Apple�
Number of devices: 2
    Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�
    Iris�

Now, trying to invoke the flow example with ./asl-flow --device Iris gives:

ASL WARNING: Requested combination of platform(Apple�) and device(Iris) not found! Using:
platform = Apple�
device = Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�.
Data initialization... Finished
Numerics initialization... Finished
Computing...Finished
Computation statistic:
Real Time = 26.8742; Processor Time = 98.7167; Processor Load = 367.329%

I found out, that the strings of the platform/device information contain trailing '\0' chars, while the parameter strings as delivered by boost do not, and the comparison (device == getDeviceName(queues[i])) on line 106 in aclHardware.cxx fails, since "Iris\0" is different from "Iris".

I changed the lines 501/502 in aslParametersManager.cxx to:

            string pf = vm["platform"].as<string>(); if (pf.back() != '\0') pf += '\0';
            string dv = vm["device"  ].as<string>(); if (dv.back() != '\0') dv += '\0';
            acl::hardware.setDefaultQueue(pf, dv);

And with that in place I can have ASL simulation run by the GPU.

Data initialization... Finished
Numerics initialization... Finished
Computing...Finished
Computation statistic:
Real Time = 13.3571; Processor Time = 1.27338; Processor Load = 9.53341%

EDIT: Now, seeing this issue message, I guess it would be better to remove the trailing '\0' chars from the strings of the platform/device information, since, quite obviously these nul's made it into the diagnostic output of the simulation tools.

BTW: The locomotive example with the default parameters 0.08/1.0/10001 crashes when run on the GPU, while it finishes on the CPU. If I change dx to 0.09, then it finishes on the GPU as well after 1200 s, while it took 20 times more on the CPU at a load of 400 %.

AvtechScientific commented 7 years ago

Those trailing '\0' chars are delivered by buggy OpenCL drivers. The problem is that a bad driver produces them and a good driver doesn't - with the same hardware and on the same OS. So you have to check whether they exist and if yes - strip them. We are not sure, however, that we should correct issues caused by bad drivers by making our code more clumsy.

cyclaero commented 7 years ago

Those trailing '\0' chars are delivered by buggy OpenCL drivers.

It is quite hard for me to understand how this could happen, since any OpenCL driver is supposed to return a C string, i.e. a C array of chars terminated by a nul char, isn´t it?

Do OpenCL drivers on Linux really provide non-terminated arrays of chars? Isn´t it rather the C++ side which gives rise for a lot of confusion about subtle differences between character arrays and strings. My impression comes from searching Google about "C++ string trailing nul". Seems that the confusion started with the switch from C++11 to C++14. On macOS 10.12.6 the code is compiled with the latest clang++ (which is C++11) are you working with a C++14-compiler?

cyclaero commented 7 years ago

I changed line 1128 of cl.hpp to:

        param->assign(value.begin(), (value.back() != '\0') ? value.end() : value.end()-1);

This takes care of a trailing nuls in case any buggy OpenCL driver insists on delivering it. Doesn't look too clumsy, does it?

The alternative is not to advertise the macOS as a working platform anymore, because without taking care of the nuls in the one or the other form, ASL would be useless on the Mac.

AvtechScientific commented 7 years ago

Hi @cyclaero,

an elegant solution, indeed. And most importantly - cl.hpp is not actually ASL, but rather an C++wrapper to OpenCL - so it is the right place to take care of the issue. Do you want to make a Pull Request with this fix (and maybe your first solved issue as well)? We'll need to test them on a non Mac platform though...

Thank you!

cyclaero commented 7 years ago

I had a closer look on the upstream C++-API headers on https://github.com/KhronosGroup/OpenCL-CLHPP. And it turned out, that the issue has been resolved in cl2.hpp on exactly the same place which I already suggested, however with a less C'ish and more C++'ish solution. So, I decided to go for back porting the upstream cl2 fix.

I cloned your repository to my GitHub account, I updated cl.hpp to the upstream version 1.2.9 and then back ported 2 fixes from cl2.hpp to the present cl.hpp for the specialized GetInfoHelper for string params, namely fix the trailing '\0' issue and the empty string result issue. Chances are that this would simply work on the vast majority of platforms since these fixes contain only code snippets which came from the Khronos Group.

see: https://github.com/cyclaero/ASL/commit/a38f1d4a5968d099d8296aa9df7a2c5f02eb9a08

If this approach is OK for you, then I would submit a pull request, if not, then please inform.