Closed VRichardJP closed 1 year ago
reinterpret_cast
usage in autoware.universeFor example:
sockaddr_in client;
int new_sock = accept(sock, reinterpret_cast<sockaddr *>(&client), &len);
Valid. Exactly what reinterpret_cast
has been designed for.
Note that it is recommended to use static_cast
in the case it is only necessary to cast from/to void*
.
For example:
file.write(reinterpret_cast<const char *>(plan_->data()), plan_->size());
Valid. It is possible to cast into char*
, unsigned char*
, std::byte*
and their const
variants.
Writing the object bytes is legal, but not portable (e.g. because endianess, data type size, etc).
For example:
reinterpret_cast<float32_t *>(v.data())[i] = dis(gen)
This is UB and dangerous. I know everyone does that and "it works" because most compilers are kind enough, but it's a tic-toc bomb you leave behind.
First, why is that not valid C++ code? C++ has many complex rules regarding what makes a valid "object" and using reinterpret_cast<T*>
on a buffer is not one of them. reinterpret_cast
is basically a way to say to the compiler "trust me, someone else has created a valid object here before".
In practice however, this UB is known to work perfectly fine under the following conditions:
T
is trivially copiable (such as int
, float
, struct { int, char }
)T
(e.g. float
must be aligned on 4 bytes)Condition 1 is not a problem in autoware code base, as I have ever only seen reinterpret_cast
with simple types and structs.
Condition 3 is normally not a problem in this context. To make the compiler break the program because of strict aliasing, you really have to want it.
Condition 2 is the important one. If the buffer is not aligned properly, the compiler may definitely break the program. Luckily (or unluckily depending how you see it) modern x86 and ARM64 can access badly aligned data (just slightly slower). So the program may still run fine when optimization level is low. However if you start to push the compiler for more advanced optimizations (e.g. -msse
, -march=native
), then your program will either compute rubbish and simply fail. A good illustration here.
If 1, 2 and 3 are met, the reinterpret_cast
is officially an undefined behavior but works in practice. However, if you can't be 100% sure the data is properly aligned (for example if the buffer is created by another library), then it is better to use std::memcpy
. With -O2
or -O3
the memcpy
has no performance impact over the unsafe reinterpret_cast
.
This is just a worse version of 3. Even if the types looks the same, there is no reason not to copy the data properly, which can't break the program and will be optimized out by the compiler.
Here is a list of all reinterpret_cast
use in universe, categorized by pattern (from 1 to 4). For type 3, I make the difference between A and B: A when the cast is "safe" (typically alignment is correct) and B when it smells:
perception/traffic_light_ssd_fine_detector/src/nodelet.cpp
240: reinterpret_cast<const float *>(pixels.datastart), // Type 3A: cv::Mat data is aligned on 16-bytes
241: reinterpret_cast<const float *>(pixels.dataend)); // Type 3A: cv::Mat data is aligned on 16-bytes
perception/traffic_light_ssd_fine_detector/lib/src/trt_ssd.cpp
164: file.write(reinterpret_cast<const char *>(plan_->data()), plan_->size()); // Type 2
perception/lidar_apollo_instance_segmentation/src/detector.cpp
72: outfile.write(reinterpret_cast<char *>(plan->data()), plan->size()); // Type 2
perception/lidar_apollo_instance_segmentation/lib/include/Utils.hpp
122: *reinterpret_cast<T *>(buffer) = val; // Type 3B: unaligned write -> should use memcpy instead
129: val = *reinterpret_cast<const T *>(buffer); // Type 3B: unaligned read -> should use memcpy instead
perception/traffic_light_ssd_fine_detector/lib/include/cuda_utils.hpp
60: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(U) * n)); // Type 1
70: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(T))); // Type 1
perception/traffic_light_classifier/utils/trt_common.cpp
99: reinterpret_cast<const void *>(engine_str.data()), engine_str.size())); // Type 1
perception/traffic_light_classifier/utils/trt_common.hpp
94: ::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(U) * n); // Type 1
104: ::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(T)); // Type 1
perception/bytetrack/lib/include/lapjv.h
52: if ((x = reinterpret_cast<t *>(malloc(sizeof(t) * (n)))) == 0) { \ // Speciale case. Looks like a C code (t*)malloc(...) rewritten for C++. static_cast<T*>(malloc()) is a valid C++ object construct, but reinterpret_cast is not. I doubt any compiler would break this, but it could simplified to (x = new t[(n)]). No need for any cast
perception/lidar_apollo_segmentation_tvm/test/main.cpp
55: reinterpret_cast<float32_t *>(v.data())[i] = dis(gen); // Type 3B: v.data() is not aligned to float32_t
perception/compare_map_segmentation/include/compare_map_segmentation/multi_voxel_grid_map_update.hpp
175: const std::uint8_t * pt_data = reinterpret_cast<const std::uint8_t *>(&(*input_)[index]); // Type 2. If you want to be pedantic uint8_t is not official way to access object internal bytes (but in practice uint8_t == unsigned char).
perception/tensorrt_yolo/lib/src/trt_yolo.cpp
276: file.write(reinterpret_cast<const char *>(plan_->data()), plan_->size()); // Type 2
perception/tensorrt_yolo/lib/include/cuda_utils.hpp
80: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(U) * n)); // Type 1
88: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(T))); // Type 1
113: T * ptr = reinterpret_cast<T *>(workspace); // Type 3A: workspace is aligned for type T
114: workspace = reinterpret_cast<void *>(reinterpret_cast<uintptr_t>(workspace) + size); // Type 1 and some legal pointer arithmetic
perception/tensorrt_yolo/lib/include/calibrator.hpp
187: output.write(reinterpret_cast<const char *>(cache), length); // Type 2
perception/tensorrt_yolo/lib/src/plugins/nms.cu
149: int * num_selected = reinterpret_cast<int *>(indices_sorted); // Unecessary cast, indices_sorted is already of type int*
perception/tensorrt_yolo/lib/src/plugins/yolo_layer_plugin.cpp
90: *reinterpret_cast<T *>(buffer) = val; // Type 3B: unaligned write -> should use memcpy instead
97: val = *reinterpret_cast<const T *>(buffer); // Type 3B: unaligned write -> should use memcpy instead
160: char * d = reinterpret_cast<char *>(buffer); // Type 1
perception/tensorrt_yolo/lib/src/plugins/nms_plugin.cpp
65: *reinterpret_cast<T *>(buffer) = val; // Type 3B: unaligned write -> should use memcpy instead
72: val = *reinterpret_cast<const T *>(buffer); // Type 3B: unaligned write -> should use memcpy instead
system/bluetooth_monitor/src/bluetooth_monitor.cpp
75: ret = connect(socket_, reinterpret_cast<sockaddr *>(&address), sizeof(address)); // Type 1
system/bluetooth_monitor/service/l2ping_service.cpp
49: socket_, SOL_SOCKET, SO_REUSEADDR, reinterpret_cast<char *>(&opt), (socklen_t)sizeof(opt)); // Type 1
86: int new_sock = accept(socket_, reinterpret_cast<sockaddr *>(&client), &len); // Type 1
system/system_monitor/src/net_monitor/nl80211.cpp
37: auto * rate = reinterpret_cast<float *>(arg); // Type 1.
124: ret = nl_cb_set(cb_, NL_CB_VALID, NL_CB_CUSTOM, callback, reinterpret_cast<void *>(&bitrate_)); // Type 1
system/system_monitor/reader/msr_reader/msr_reader.cpp
104: sock, SOL_SOCKET, SO_REUSEADDR, reinterpret_cast<char *>(&opt), (socklen_t)sizeof(opt)); // Type 1
137: int new_sock = accept(sock, reinterpret_cast<sockaddr *>(&client), &len); // Type 1
system/system_monitor/reader/hdd_reader/hdd_reader.cpp
214: info->serial_ = std::string(reinterpret_cast<char *>(data) + 20, 20); // Type 2
219: info->model_ = std::string(reinterpret_cast<char *>(data) + 54, 40); // Type 2
388: info->total_data_written_ = *(reinterpret_cast<uint64_t *>(&data[48])); // Type 3B: data is not aligned
392: info->power_on_hours_ = *(reinterpret_cast<uint64_t *>(&data[128])); // Type 3B: data is not aligned
536: sock, SOL_SOCKET, SO_REUSEADDR, reinterpret_cast<char *>(&opt), (socklen_t)sizeof(opt)); // Type 1
568: int new_sock = accept(sock, reinterpret_cast<sockaddr *>(&client), &len); // Type 1
system/system_monitor/test/src/hdd_monitor/test_hdd_monitor.cpp
185: ThreadTestMode * mode = reinterpret_cast<ThreadTestMode *>(args); // Type 1
197: sock, SOL_SOCKET, SO_REUSEADDR, reinterpret_cast<char *>(&opt), (socklen_t)sizeof(opt)); // Type 1
226: int new_sock = accept(sock, reinterpret_cast<sockaddr *>(&client), &len); // Type 1
system/system_monitor/test/src/cpu_monitor/test_intel_cpu_monitor.cpp
179: ThreadTestMode * mode = reinterpret_cast<ThreadTestMode *>(args); // is originally a ThreadTestMode
191: sock, SOL_SOCKET, SO_REUSEADDR, reinterpret_cast<char *>(&opt), (socklen_t)sizeof(opt)); // Type 1
220: int new_sock = accept(sock, reinterpret_cast<sockaddr *>(&client), &len); // Type 1
perception/lidar_centerpoint/lib/network/tensorrt_wrapper.cpp
151: file.write(reinterpret_cast<const char *>(plan_->data()), plan_->size()); // Type 2
162: reinterpret_cast<const void *>(engine_str.data()), engine_str.size())); // Type 1
perception/lidar_centerpoint/include/lidar_centerpoint/cuda_utils.hpp
80: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(U) * n)); // Type 1
88: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(T))); // Type 1
113: T * ptr = reinterpret_cast<T *>(workspace); // Type 3A: workspace is aligned for type T
114: workspace = reinterpret_cast<void *>(reinterpret_cast<uintptr_t>(workspace) + size); // Type 1 and some legal pointer arithmetic
common/cuda_utils/include/cuda_utils/cuda_unique_ptr.hpp
43: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(U) * n)); // Type 1
51: CHECK_CUDA_ERROR(::cudaMalloc(reinterpret_cast<void **>(&p), sizeof(T))); // Type 1
68: CHECK_CUDA_ERROR(::cudaHostAlloc(reinterpret_cast<void **>(&p), sizeof(U) * n, flag)); // Type 1
76: CHECK_CUDA_ERROR(::cudaHostAlloc(reinterpret_cast<void **>(&p), sizeof(T), flag)); // Type 1
common/tensorrt_common/src/tensorrt_common.cpp
99: reinterpret_cast<const void *>(engine_str.data()), engine_str.size())); // Type 1
194: file.write(reinterpret_cast<const char *>(data->data()), data->size()); // Type 2
196: file.write(reinterpret_cast<const char *>(plan->data()), plan->size()); // Type 2
sensing/pointcloud_preprocessor/src/crop_box_filter/crop_box_filter_nodelet.cpp
131: *reinterpret_cast<const float *>(&input->data[global_offset + x_offset]), // Type 3B: unknown data alignment
132: *reinterpret_cast<const float *>(&input->data[global_offset + y_offset]), // Type 3B: unknown data alignment
133: *reinterpret_cast<const float *>(&input->data[global_offset + z_offset]), 1); // Type 3B: unknown data alignment
152: *reinterpret_cast<float *>(&output.data[output_size + x_offset]) = point[0]; // Type 3B: unknown data alignment
153: *reinterpret_cast<float *>(&output.data[output_size + y_offset]) = point[1]; // Type 3B: unknown data alignment
154: *reinterpret_cast<float *>(&output.data[output_size + z_offset]) = point[2]; // Type 3B: unknown data alignment
sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp
61: input_ring_map[*reinterpret_cast<uint16_t *>(&input_ptr->data[idx + ring_offset])].push_back( // Type 3B: unknown data alignment
87: *reinterpret_cast<float *>(&input_ptr->data[current_idx + azimuth_offset]); // Type 3B: unknown data alignment
89: *reinterpret_cast<float *>(&input_ptr->data[next_idx + azimuth_offset]); // Type 3B: unknown data alignment
94: *reinterpret_cast<float *>(&input_ptr->data[current_idx + distance_offset]); // Type 3B: unknown data alignment
96: *reinterpret_cast<float *>(&input_ptr->data[next_idx + distance_offset]); // Type 3B: unknown data alignment
107: std::move(*reinterpret_cast<PointXYZI *>(&input_ptr->data[tmp_idx]))); // Type 4: unknown data alignment + most likely incompatible data type
118: std::move(*reinterpret_cast<PointXYZI *>(&input_ptr->data[tmp_idx]))); // Type 4: unknown data alignment + most likely incompatible data type
sensing/pointcloud_preprocessor/include/pointcloud_preprocessor/outlier_filter/ring_outlier_filter_nodelet.hpp
51: PointXYZI * front_pt = reinterpret_cast<PointXYZI *>(&input_ptr->data[tmp_indices.front()]); // Type 4: unknown data alignment + most likely incompatible data type
52: PointXYZI * back_pt = reinterpret_cast<PointXYZI *>(&input_ptr->data[tmp_indices.back()]); // Type 4: unknown data alignment + most likely incompatible data type
sensing/pointcloud_preprocessor/include/pointcloud_preprocessor/passthrough_filter/passthrough_uint16.hpp
430: reinterpret_cast<const std::uint8_t *>(&input_->points[(*indices_)[iii]]); // Type 2. If you want to be pedantic uint8_t is not official way to access object internal bytes (but in practice uint8_t == unsigned char).
sensing/tier4_pcl_extensions/include/tier4_pcl_extensions/voxel_grid_nearest_centroid_impl.hpp
175: const std::uint8_t * pt_data = reinterpret_cast<const std::uint8_t *>(&input_->points[cp]); // Type 2. If you want to be pedantic uint8_t is not official way to access object internal bytes (but in practice uint8_t == unsigned char).
221: &rgb, reinterpret_cast<const char *>(&input_->points[cp]) + rgba_index, sizeof(int)); // Type 2
272: &rgb, reinterpret_cast<const char *>(&input_->points[cp]) + rgba_index, sizeof(int)); // Type 2
As explained, pattern 1 and 2 are valid, 3A is "UB but safe" so acceptable, however I think code with pattern 3B and 4 should be fixed. Even if the code "works" today, it may not be the same if new platforms, compiler or optimization flags are used.
Very interesting, thanks for the report.
With -O2 or -O3 the memcpy has no performance impact over the unsafe reinterpret_cast.
Do you have resources on this point? It is not obvious to me that it is true.
It is the common consensus I have found on the internet: For example: https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8#how-do-we-type-pun-correctly
At a sufficient optimization level any decent modern compiler generates identical code to the previously mentioned reinterpret_cast method or union method for type punning.
The article give this example+%3B%0A%0Avoid+func1(+double+d+)%0A%7B%0A++++std::int64_t+n%3B%0A++++std::memcpy(%26n,+%26d,+sizeof+d)%3B+//+OK%0A++++printf(+%22%25%22+PRId64+%22%5Cn%22,+n+)+%3B%0A%7D%0A%0Avoid+func2(+double+d+)%0A%7B%0A++++union+u1%0A++++%7B%0A++++++std::int64_t+n%3B%0A++++++double+d+%3B%0A++++%7D+%3B%0A++%0A++++u1+u+%3B%0A++++u.d+%3D+d+%3B%0A++++printf(+%22%25%22+PRId64+%22%5Cn%22,+u.n+)+%3B%0A%7D%0A%0Avoid+func3(+double+d+)%0A%7B%0A++++std::int64_t+n+%3B%0A++++n+%3D+reinterpret_cast%3Cstd::int64_t+%3E(%26d)+%3B%0A++++%0A++++printf(+%22%25%22+PRId64+%22%5Cn%22,+n+)+%3B%0A%7D%0A%0Aint+main()%0A%7B%0A++double+d+%3D+0.1%3B%0A++%0A++func1(+d+)+%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang_trunk,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'1',trim:'1'),lang:c%2B%2B,libs:!(),options:'-std%3Dc%2B%2B17+-O2',source:1),l:'5',n:'0',o:'x86-64+clang+(trunk)+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)
This other article give a more advanced use case: http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html
These days, however, it doesn’t have to be implemented as a procedure call at all; the compiler can treat it as an intrinsic, and replace with a memory-to-register load. The GCC definitely does. It compiles the following code into the code similar to the original SSE one, except it uses movdqu instead of movdqa. This instruction allows misaligned data; its performance varies. On some processors it is much slower than movdqa even when the data is actually aligned. On others it runs at virtually the same speed.
Edit: In this PR, I have replaced a few reinterpret_cast by memcpy. I have disassembled reported the hottest spot in the function. This is a memcpy that have been translated to x86 movups instructions (move unpacked 512 bits). I would not be surprised if the reinterpret_cast i changed was initially compiled with a movaps (move aligned).
I don't think we are in need of performance to the point we have to care about such platform specific details. Program correctness is way more important than micro performance improvement.
Thanks!
I don't think we are in need of performance to the point we have to care about such platform specific details. Program correctness is way more important than micro performance improvement.
Absolutely, I wasn't trying to argue the rational for the change.
Checklist
Description
As pointed here, there are places in Autoware where
reinterpret_cast
is abusively used to "convert" data types.There are endless talks over the internet about how misleading
reinterpret_cast
is. Basically, dereferencing a reinterpreted pointerp
with*reinterpret_cast<T*>(p)
is only valid ifp
originally points to data of typeT
(e.g.p = new T()
), or aconst
/signed
/unsigned
variant, or ifT==char
(orunsigned char
orstd::byte
). Ifp
originally points to any other type, even if "it looks" the same, is straight UB.For example, copying
PointXYIZ
fromuint8_t[]
like here is UB:https://github.com/autowarefoundation/autoware.universe/blob/cb103ec9c8149fffcf2a665ca37b69fe465e0a61/sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp#L123
To create a valid
PointXYZI
from the buffer, it would be necessary at least to cast each field separately (e.g. likep.x = *reinterpret_cast<float*>(&data[p_offset + x_field_offset])
).Edit: according to this excellent writeup, reinterpret casting from a buffer violates strict aliasing rules (but I guess it is ok in the scope of an immediate copy?), and could run into misaligned memory issues (but all the data we manipulate is always aligned, right?...). The author suggests the only safe way to manipulate data is to
memcpy
(yew!). Hopefully, C++ makes it possible to wrap things and avoid the memcpy boilerplate (for example theconst_unaligned_pointer
described here: http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html)Similarly, even though
PointXYZIRADRT
andPointXYZI
"start the same", it is UB to usereinterpret_cast
to copy between the 2 structs. For example:It may "work" now... until the compiler decides otherwise.
Purpose
Remove all UB
reinterpret_cast
in Autoware code base.Possible approaches
Definition of done
No more yolo
reinterpret_cast
UB.