Open adriendelsalle opened 1 year ago
@adriendelsalle Thanks for the kind words!
Indeed you are correct, the data layout was thought with thread/process-based BFEs in mind.
I suppose that, as the adoption of AVX512 increases, the availability of gather/scatter instructions would at least alleviate the issue.
As an alternative, we could think about extending the BFE API to give the user the ability to signal how the data is stored (i.e., row-major vs column-major). Of course we would need to ensure that such extension does not break existing uses of the BFE API, which could be tricky.
I love the idea of trying to signal/flag the layout.
I'll have to take a deeper look at the project to see how I could make some relevant PR on that, it will probably take few week before I can find time to really investigate further.. but if you're fine with contributions I can give it a try !
@adriendelsalle of course it would be great if you wanted to take a stab at this.
Feel free to ping me if you need assistance (here or on the gitter channel https://gitter.im/pagmo2/Lobby )
Description
This project looks really nice, thanks for that! I have a really simple question regarding compatibility of
batch_fitness
andSIMD
computation due to the layout of the input/decision vector (and the output/fitness one).From the docs:
Is it really possible to do vectorized operations when concatenating input as described without requiring to allocate a new vector and reorder it internally before calling some simd intrinsics (probably lowering the benefits of the vectorization, or even making it slower than a naive sequential impl)? I was expecting a contiguous storage of each input element: for a batch of size
b
, first component of the decision vector occupies the index range[0, b)
, etc.I understand this layout is handy for a multithreaded
BFE
to be able to work concurrently on different portions of the input vector. Is it really compatible withSIMD
?Thanks for your help, and sorry if I missed something :)