Closed michael-kenzel closed 2 years ago
As an additional note: Since this fix required emitting longer, unformatted strings, a raw .write()
function was added to the Stream
in order to avoid having to go through .fmt()
. In the process of doing so, I also updated the existing code to use .write()
instead of .fmt()
in places where it was outputting strings without any formatting. I also changed some functions that were using std::string
to simply return the contents of string literals to use std::string_view
instead. Some of these changes were rolled back in the last commit after feedback on Discord.
Regarding fmt
, we could perhaps have a fast-path with a template specialization selected when there aren't any additional arguements. I'm not a C++ template guru but that sounds doable
Regarding
fmt
, we could perhaps have a fast-path with a template specialization selected when there aren't any additional arguements. I'm not a C++ template guru but that sounds doable
That would certainly be doable, pretty simple to do actually. However, at least to me, .fmt
implies that the first string parameter is a template string for formatting, which implies stuff like you need to escape {
and }
. If we just added a specialization for .fmt
to do a raw print in case no other arguments are provided, then that would mean the meaning of the first parameter changes depending on whether other arguments are present or not which, imo, is not ideal. So I think this is better served by having a separate method rather than having two versions of the same method with different semantics…
Why do you not use our own types when emitting the code, that is, i32
and u32
vs. s32
and u32
?
__STDC_IEC_559__
is not defined by the compiler, shouldn't we remove it?
https://stackoverflow.com/questions/31181897/status-of-stdc-iec-559-with-modern-c-compilers
f16
is not defined for __CUDACC_VER_MAJOR__ <= 8
.
I also argue to remove write function. This duplicates logic we already have and is only used in some places. It is not clear to the user why / when to use write
over fmt
.
I have revisited this and prepared another version of this PR based on the latest state of the codebase that should hopefully address all the previously raised issues.
The C backend currently performs an unconditional mapping of the fixed-width PrimTypes to fundamental types in C. However, the exact sizes of fundamental types in C are implementation-defined and generally depend on the ABI of the target platform. For example, the C backend in its current form will always map
u64
tolong
. This would result in potentially incorrect codegen on Windows sincelong
is a 32-Bit type in the Windows ABI. Whilelong
is a 64-Bit type under some circumstances in CUDA, it isn't a 64-Bit type under all circumstances (potentially depends on the host compiler). CUDA generally useslong long
as its 64-Bit integer type. In addition to the potential problems noted before, the use oflong
here results in code that uses 64-Bit integer atomics failing to compile since CUDA only has overloads forunsigned long long
for 64-Bit atomics, none forunsigned long
.After some discussion on Discord, it was decided that the best way to fix this would be to emit a preamble of typedefs to perform the mapping to the appropriate fundamental types depending on the target C dialect. For plain standard C, we emit typedefs that map to the fixed-width integer types. We also only map
f32
andf64
if__STDC_IEC_559__
is defined since there are otherwise no guarantees thatfloat
anddouble
are the appropriate IEEE 754 floating point types. For OpenCL and CUDA, we perform the mapping according to the guaranteed sizes as noted in the respective specification/documentation.This pull request delivers a proposed implementation of this fix.