Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Bad alignment in convertUTF16ToUTF8String #18481

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR18482
Status NEW
Importance P normal
Reported by Jakob Stoklund Olesen (stoklund@2pi.dk)
Reported on 2014-01-14 21:29:50 -0800
Last modified on 2014-01-20 20:06:26 -0800
Version trunk
Hardware Sun All
CC geek4civic@gmail.com, llvm-bugs@lists.llvm.org, rafael@espindo.la, rnk@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
lib/Support/ConvertUTFWrapper.cpp contains this code:

bool convertUTF16ToUTF8String(ArrayRef<char> SrcBytes, std::string &Out) {
  ...
  const UTF16 *Src = reinterpret_cast<const UTF16 *>(SrcBytes.begin());
  const UTF16 *SrcEnd = reinterpret_cast<const UTF16 *>(SrcBytes.end());
  ...
  if (Src[0] == UNI_UTF16_BYTE_ORDER_MARK_SWAPPED) {

The UTF16 type is normally 2-byte aligned, and there is no guarantee that the
ArrayRef points to aligned bytes.

This crashes the unit test ConvertUTFTest.ConvertUTF16LittleEndianToUTF8String
on SPARC.
Quuxplusone commented 10 years ago

Yep. What's the usual way to do unaligned loads for something on SPARC?

Quuxplusone commented 10 years ago

The code generator does the right thing if you give it an unaligned IR load. You can use a 2-byte memcpy or a pointer to a packed struct with a UTF16 member.

You can probable;y reproduce on Intel if you compile with -fsanitize=alignment and make sure the test case is passing an unaligned string.