allendaicool / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

fill_n/copy_n fails in various contexts #275

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Please post a short code sample which reproduces the problem:

Unfortunately the problem is not reproducible with a short code, the problem 
only manifests itself within relatively complex codes, but I did shrink it down 
to ~135 lines in total.  Please see the attached.  bugs.cpp contains the main 
code.  soa.h overrides the default Array implementation in the code to provide 
a SoA memory layout.   tuple_io.h is from boost, but adapted to work with 
thrust.

What is the expected output?

starting
(5 (0 0 0) 0)
(0 (0 25 0) 0)
copying
getting values
printing
(5 (0 0 0) 0)
(0 (0 25 0) 0)

What do you see instead?
It either works, crashes, doesn't compile, or gives garbage output, depending 
on the configuration.

 fill_n, without SOA
     gcc 4.2, 4.5 (mac): compiles and gives correct results
     icpc 11.1 (mac):  compiles and gives correct results
     nvcc 3.2rc2 (centos): does not compile, yes I compiled with nvcc,  thrust/detail/device/cuda/for_each.inl:93: error: invalid application of ‘sizeof’ to incomplete type ‘thrust::detail::STATIC_ASSERTION_FAILURE<false>’ 

 fill_n, with SOA
     gcc 4.2, 4.5:  does not compile
     icpc  (mac): does not compile
     nvcc 3.2rc2 (centos): hangs after printing copying, then crashes with terminate called after throwing an instance of 'std::bad_alloc'   what():  St9bad_alloc   Abort

copy_n, without SOA
     gcc 4.2, 4.5 (mac): garbage results: (2.12201e-314 (6.95322e-310 6.95322e-310 6.95322e-310) 2.12201e-314), (6.95322e-310 (6.95322e-310 6.95322e-310 2.12201e-314) 6.95322e-310)
     icpc 11.1 (mac): compiles and gives correct results
     nvcc 3.2rc2 (centos): does not compile, yes I compiled with nvcc,  thrust/detail/device/cuda/for_each.inl:93: error: invalid application of ‘sizeof’ to incomplete type ‘thrust::detail::STATIC_ASSERTION_FAILURE<false>’ 

copy_n,  with SOA
     gcc 4.2, 4.5 (mac): garbage results: (2.12201e-314 (6.95322e-310 6.95322e-310 6.95322e-310) 2.12201e-314), (6.95322e-310 (6.95322e-310 6.95322e-310 2.12201e-314) 6.95322e-310)
     icpc 11.1 (mac): compiles and gives correct results
     nvcc 3.2rc2 (centos): hangs after printing copying, then crashes with terminate called after throwing an instance of 'std::bad_alloc'   what():  St9bad_alloc   Abort

What version of Thrust are you using? 

changeset:   1316:759868ac247e
user:        jaredhoberock
date:        Tue Nov 02 15:15:01 2010 -0700
summary:     Use copy_n in fill_n implementation

with my personal addition of tuple_io.h  (attached)

Note, this is a problem I've seen manifest prior also using Thrust 1.2.1 with 
CUDA 3.1

Which version of nvcc? Which host compiler?  On what operating system?

I considered 4 configurations:
* Mac, Apple gcc 4.2 + OpenMP backend
* Mac, MacPorts gcc 4.5 + OpenMP backend
* Mac, Intel icpc 11.1 + OpenMP backend
* CentOS, nvcc 3.2rc2 + CUDA backend

The attached code can be modified by commenting and uncommenting TEST_SOA   and 
TEST_FILL_N

Here's a summary of what happens:

Original issue reported on code.google.com by andrew.c...@gmail.com on 10 Nov 2010 at 2:58

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I'm sorry.  I was compiling a .cpp file with nvcc.  I symlinked it to bugs.cu  
and recompiled and ran the tests again with CUDA.

fill_n, without SOA
     nvcc 3.2rc2 (centos): compiles and gives correct results

 fill_n, with SOA
     nvcc 3.2rc2 (centos): does not compile   thrust/detail/device/cuda/for_each.inl(72): error: no instance of overloaded function "thrust::detail::generate_functor<Generator>::operator() .... " matches the argument list

copy_n, without SOA
     nvcc 3.2rc2 (centos): compiles and gives correct results

copy_n,  with SOA
     nvcc 3.2rc2 (centos): compiles and gives correct results

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 3:31

GoogleCodeExporter commented 9 years ago
What about this one?

fill_n, with SOA
    nvcc 3.2rc2 (centos): does not compile   thrust/detail/device/cuda/for_each.inl(72): error: no instance of overloaded function "thrust::detail::generate_functor<Generator>::operator() .... " matches the argument list

Were you able to compile or not?

Original comment by jaredhoberock on 10 Nov 2010 at 3:36

GoogleCodeExporter commented 9 years ago
Hi Jared,   It didn't compile.  The compiler error message is right there in 
what you just posted.

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 3:41

GoogleCodeExporter commented 9 years ago
OK, thanks, we'll have a look.

Original comment by jaredhoberock on 10 Nov 2010 at 3:44

GoogleCodeExporter commented 9 years ago
Thanks!

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 3:50

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 70f1c9c123.

Original comment by jaredhoberock on 10 Nov 2010 at 10:01

GoogleCodeExporter commented 9 years ago
Thank you for fixing the compiler bug.   fill_n compiles and works now in all 
cases that I tested.

But copy_n + SOA is still not working with OpenMP, unless I compile with 
optimization.  The following is on a Mac with MacPorts g++-4.5, same thing with 
Apple g++-4.2.  I also tested with g++-4.5 in 64-bit CentOS, and got garbage 
results, unless I compiled with optimization.   Intel icpc is fine across both 
Mac and CentOS.

humphrey% g++-mp-4.5 bugs.cpp -fopenmp -I ../lcpprivate/thrust 
-DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP -o bugs
humphrey% ./bugs
starting
(5 (0 0 0) 0)
(0 (0 25 0) 0)
copying
getting values
printing
(2.12201e-314 (6.95322e-310 6.95322e-310 6.95322e-310) 2.12201e-314)
(6.95322e-310 (6.95322e-310 6.95322e-310 2.12201e-314) 6.95322e-310)

humphrey% g++-mp-4.5 -O3 bugs.cpp -fopenmp -I ../lcpprivate/thrust 
-DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP -o bugs
humphrey% ./bugs                                                                

starting
(5 (0 0 0) 0)
(0 (0 25 0) 0)
copying
getting values
printing
(5 (0 0 0) 0)
(0 (0 25 0) 0)

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 10:40

GoogleCodeExporter commented 9 years ago
Ok, so, it's not just SOA, even without AOS I still get garbage out unless I 
compile with optimization, when using copy_n + make_constant_iterator.  CUDA is 
fine in all cases by the way.

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 10:48

GoogleCodeExporter commented 9 years ago

Original comment by jaredhoberock on 10 Nov 2010 at 10:49

GoogleCodeExporter commented 9 years ago
Andrew,

Please let me know if the changes in my clone [1] this fixes these remaining 
issues.

[1] https://jaredhoberock-thrust-no-referenced-state.googlecode.com/hg/

Original comment by jaredhoberock on 10 Nov 2010 at 10:57

GoogleCodeExporter commented 9 years ago
That did the trick!   Thank you!

Original comment by andrew.c...@gmail.com on 10 Nov 2010 at 11:02

GoogleCodeExporter commented 9 years ago
This issue was closed by revision ac4def6859.

Original comment by jaredhoberock on 10 Nov 2010 at 11:46