Currently the NVTT library has to be compiled for a specific instruction
set. The SSE2 code path is 40% faster than the SSE code path, and no SSE is
much slower, 3-4 times slower. It would be nice to automatically select the
best code path dynamically according to the available CPU capabilities.
Original issue reported on code.google.com by cast...@gmail.com on 12 Dec 2007 at 10:27
Original issue reported on code.google.com by
cast...@gmail.com
on 12 Dec 2007 at 10:27