Don't use signed char for byte buffers

etremel commented 3 years ago

Throughout our codebase, we have been using arrays of char (or pointers to char) to represent byte buffers. However, it has long been the consensus in C++ programming that the signed char type should only be used for actual characters, and unsigned char should be used for bytes (see this question for example). Now that uint8_t is widely supported, it makes even less sense to keep using char for bytes, because uint8_t is a lot easier to type than unsigned char and we use the fixed-size data types everywhere else we care about the size of our data in memory (e.g. uint32_t, uint64_t).

Although it will be tedious, we should really take the time to go through and replace all our byte buffers with uint8_t.

KenBirman commented 3 years ago

While making this change it might be worthwhile to also mark SST memory using a weak atomic (in C++ atomics, we would use “relaxed consistency” for entries that just need sequential consistency – which is everything except guards – but use “acquire-release consistency” for columns used as guards, such as the suspected bit, wedged bit, Sagar’s keep-alive counter, the proposed, acked and commit counts, the SMC messages-available bit (interestingly, NOT the “this slot is ready to send” bit – that gets used locally but has no distributed semantic, so it isn’t actually a guard), and the final bit.

In contrast the SMC slots would use relaxed consistency, as would next view proposals and the nReceived vector.

Right now Mae has us using “volatile” annotations in C++ but on ARM, the issue arises that volatile only forces the compiler to go back to memory for objects each time they are referenced – it doesn’t actually tell the MMU that the underlying object shouldn’t be cached. As a result, volatile doesn’t have memory fencing semantics and we will eventually see bugs due to this. If we switch to weak atomic annotations we actually might get better performance (because the MMU will realize that the SMC slots in fact can be updated asynchronously, which could speed up RDMA writes), while also ensuring correctness. Volatile is actually automatic when you use the std::atomic annotations do we could then remove volatile annotations throughout.

From: Edward Tremel @. Sent: Friday, November 5, 2021 12:31 PM To: Derecho-Project/derecho @.> Cc: Subscribed @.***> Subject: [Derecho-Project/derecho] Don't use signed char for byte buffers (Issue #218)

Throughout our codebase, we have been using arrays of char (or pointers to char) to represent byte buffers. However, it has long been the consensus in C++ programming that the signed char type should only be used for actual characters, and unsigned char should be used for bytes (see this questionhttps://stackoverflow.com/questions/653336/should-a-buffer-of-bytes-be-signed-or-unsigned-char-buffer for example). Now that uint8_t is widely supported, it makes even less sense to keep using char for bytes, because uint8_t is a lot easier to type than unsigned char and we use the fixed-size data types everywhere else we care about the size of our data in memory (e.g. uint32_t, uint64_t).

Although it will be tedious, we should really take the time to go through and replace all our byte buffers with uint8_t.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Derecho-Project/derecho/issues/218, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQMF33C25JBTN4TL4WTVZTUKQIFNANCNFSM5HOHI4LQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

etremel commented 2 years ago

Fixed by ed8b2ea5337407578c8cd5eadbcefc8f70d59016

Derecho-Project / derecho

Don't use signed char for byte buffers #218