gnuradio / volk

The Vector Optimized Library of Kernels
http://libvolk.org
GNU Lesser General Public License v3.0
550 stars 203 forks source link

volk_32f_x2_add_32f_neonpipeline doesn't support in-place ops #58

Closed trondeau closed 8 years ago

trondeau commented 8 years ago

Like how we use it in GNU Radio's add_ff_impl.cc:

for(size_t i = 1; i < input_items.size(); i++) volk_32f_x2_add_32f(out, out, (const float*)input_items[i], noi);

Will segfault. The "neon" and "neonasm" work fine, though.

n-west commented 8 years ago

Could not duplicate with minimal test program, will try again another time after I fix gnuradio on my e310

int main()
{
    int N = 4096;
    unsigned int alignment = volk_get_alignment();
    float* increasing = (float*)volk_malloc(sizeof(float)*N, alignment);
    float* ones = (float*)volk_malloc(sizeof(float)*N, alignment);
    float* out = (float*)volk_malloc(sizeof(float)*N, alignment);
    for(unsigned int ii = 0; ii < N; ++ii){
            increasing[ii] = (float)ii;
                ones[ii] = 1.f;
    }
    volk_32f_x2_add_32f_manual(increasing, increasing, ones, N, "neonpipeline");                                                 
    volk_free((void*)increasing);
    volk_free((void*)ones);
    volk_free((void*)out);
    return 0;
}
n-west commented 8 years ago

@trondeau can you provide an example? I cannot reproduce with volk_config set to use neonpipeline and the following:

from gnuradio import gr, blocks
from time import sleep

adder = blocks.add_ff()
sauce = blocks.null_source(gr.sizeof_float)
sink = blocks.null_sink(gr.sizeof_float)

tb = gr.top_block()

tb.connect((sauce, 0), (adder, 0))
tb.connect((sauce, 0), (adder, 1))
tb.connect((adder, 0), (sink, 0))

tb.start()

sleep(5)

tb.stop()
root@ettus-e300:~# gnuradio-config-info -v
3.7.8
root@ettus-e300:~# volk-config-info -v
1.1
trondeau commented 8 years ago

Don't use a null_source for the input. Your example worked for me, but I segfault when I change the source to analog.sig_source_f(1, analog.GR_SIN_WAVE, 0.01, 1).

volk_32f_x2_add_32f neonpipeline neonpipeline --> fails volk_32f_x2_add_32f u_neon u_neon --> works

n-west commented 8 years ago

Fixed