Celebrandil / CudaSift

A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
MIT License
860 stars 285 forks source link

Convert from cudaMallocPitch to cudaMalloc #72

Open mnicely opened 4 years ago

mnicely commented 4 years ago

Would it be possible to convert cudaMallocPitch calls to cudaMalloc? I understand why cudaMallocPitch was chosen, but those limitations are not as noticeable today with larger cache sizes.

The main driver for this enhancement is for optimal functionality with DALI. DALI loads batches of images using cudaMalloc. The reason being that DALI is not concerned with what is being loaded and it could be something besides an image.

Currently, the image must be copied from its cudaMalloc location to the new cudaMallocPitch location. If CudaSift used cudaMalloc, operation could then be performed in-place. This would save memory and time.