Closed mehmetoguzderin closed 2 years ago
PR looks good overall, just some minor feedback on whitespace changes that, if needed, should be in a separate PR. Other than that, curious to know the reasoning behind removal of oroMemcpy and oroGetLastError.
@jammm Memcpy (there is a cuMemcpy, but that is the copy from unified address to unified address without enum) and GetLastError do not exist in the cu* driver methods (I might have overlooked them). I suspect, except for the Auto variant of enum, Memcpy should be poly fillable trivially, but GetLastError might require specifically more state tracking.
@jammm Memcpy (there is a cuMemcpy, but that is the copy from unified address to unified address without enum) and GetLastError do not exist in the cu* driver methods (I might have overlooked them). I suspect, except for the Auto variant of enum, Memcpy should be poly fillable trivially, but GetLastError might require specifically more state tracking.
__ORO_FUNC2
is used for resolving GetLastError and Memcpy to cudaGetLastError and cudaMemcpy respectively, which should be fine I believe, as runtime and driver APIs can be used interchangeably for the most part. So I think we should keep them. @takahiroharada what do you think?
@jammm I think the aim is to remove dependency on the runtime API and use the more common driver API instead where there are more guarantees for its presence across platforms.
@jammm I think the aim is to remove dependency on the runtime API and use the more common driver API instead where there are more guarantees for its presence across platforms.
I think it's better to allow Orochi to work with both runtime and driver APIs as helps app developers quickly switch to Orochi from their HIP/CUDA app with driver and/or runtime code. There are apps that would rely on these runtime APIs so having them would help with cross-vendor support ^^
@jammm I'd assume that for such cases, emulation could help more as the methods seem to be enumerable, FTTB being just the last error and the generic memcpy. (as the general gist around projects that do dynamic loading, of course, depending on their description, is to build something that can be reliably distributed as a user-space binary that does not need any kind of developer tooling.)
@jammm I'd assume that for such cases, emulation could help more as the methods seem to be enumerable, FTTB being just the last error and the generic memcpy. (as the general gist around projects that do dynamic loading, of course, depending on their description, is to build something that can be reliably distributed as a user-space binary that does not need any kind of developer tooling.)
May I know what you mean by emulation and enumeration? If by enumeration you mean querying the DLL for the subset of available APIs based on the current platform, that's possible, but we don't have such a feature right now. Hence for the time being, it'd be great if we can keep Memcpy and GetLastError as they're used by other applications and removing them would break those applications.
@jammm Enumerable as in it is only two methods that can be polyfilled (see the comment related to Auto variant for Memcpy case, last error could state track where driver-provided API does not.), certainly not the query version.
This PR allows Orochi to run in WSL2 with CUDA (tested on Ubuntu 22) and other Linux configurations by dropping the runtime support. This change happens through the split of CUDA support as Driver and RTC and the removal of a couple of utilities/values that do not have direct correspondence.