Open mohawk2 opened 2 years ago
PDL should also allow for custom allocators so that they can be used for situations where different allocators can be more efficient or integrate better with an existing external library.
For example, aligned memory (e.g., through C11's aligned_alloc
, Windows-specific _aligned_malloc
) can give a significant performance boost because it allows for using particular SSE/AVX instructions[*].
As discussed in IRC, I would like to be able to set this at runtime per-ndarray (internal C interface).
A related question is providing a high-level way to indicate that the output of operations between ndarrays (either another aligned-alloc-N-bytes ndarray or a regular-malloc ndarray) should be into a PDL that uses a specific allocator (my goal is that aligned-alloc-N-bytes ndarray would output to another aligned-alloc-N-bytes ndarray).
[*] Taking advantage of SSE/AVX for particular PDL ops is also something to look into and is likely a whole big project on its own.
Would a way to get close(?) to this with current PDL be to make a PDL subclass (adjusting your suggested name to PDL::Aligned) which allocated memory in a suitable alignment, and set PDL_DONTTOUCHDATA
? An ndarray of the appropriate size could be constructed using current code and passed as the output ndarray(s) of given operations.
An alternative approach might be just to use MALLOCDBG
in the PDL config so that all memory is allocated with an alignment, in some way, or other means of globally setting allocate/free.
SSE/AVX utilisation might be better captured on #349. Notes should include pointers (ha!) on how to do so from the C level.
Tasks:
PDL_CORE_LIST
intoPDL_PERL_LIST
andPDL_API_LIST
- for compatibility reasons this will need to remain called "pdlcore.h" so may as well still call the .c the same, and probably mergepdlperl.h
back in; the C API stuff would need to move topdl.h
def_dims
PDL_Value
union type for thePDL_Anyval.value
pdl.value
entry to be used for ndarrays whosedata
segment is less than that number of bytes (which for small types could be several elements) obviously including scalar/single-value, using the alloc/default macrorealloc
rather thanpdl_grow
/pdl_makescratchhash
pdl_impl_vtable *
pdl.impl
pointer, and void *pdl.impl_data
that would replace the currentdatasv
,sv
,hdrsv
(probably with a struct for the Perl implementation to use), implementing a Perl versioncroak
(etc)-using code into that vtable (especiallypp_indterm
)pdl.hdrsv
from the C code, make the Perl object routinely be a hashref with aPDL
and ahdr
element so any copying could be done at Perl levelThis is somewhat connected to the #349 ideas on making a broadcastloop vtable, but only somewhat.