bespoke-silicon-group / bsg_replicant

BSG Replicant: Cosimulation and Emulation Infrastructure for HammerBlade
BSD 3-Clause "New" or "Revised" License
26 stars 20 forks source link

CUDA Memcpy Byte Transfer Support #817

Closed natewise closed 1 year ago

natewise commented 1 year ago

Hello!

So I have a CUDA example where I would like the host to be able to communicate byte-wise with the device. Currently I can't do this directly:

ERROR:   kernel: hb_mc_manycore_read_mem: Input 'sz' = 1: only multiples of 4 are supported
ERROR:   hb_mc_manycore_eva_read_internal: Failed to copy data from host to NPA
ERROR:   'hb_mc_manycore_eva_read(device->mc, &default_map, &pod->mesh->origin, &daddr, haddr, bytes)' failed: Not implemented
ERROR:   'hb_mc_device_memcpy(device, dst, src, sizeof(T), HB_MC_MEMCPY_TO_HOST)' failed: Not implemented

I understand doing things byte-wise will be 4x slower overall, but in my use case there will be extra processing required anyways to pack/unpack the bytes from 32-bit words, so sending bytes word-aligned just adds complexity to my program

Thanks!