Open SadClouds opened 4 weeks ago
Thank you for your report.
Do you think you would be able to do a git bisect
to narrow down the commit that caused this issue?
The issue appears to be on this line (whichever git commit that is) tigervnc-1.14.0/common/rfb/EncodeManager.cxx:1081
SIGBUS is a misaligned load/store. I'm not a C++ programmer, but normally casting a pointer to a different type and dereferencing it, causes these types of issues.
That code is unfortunately very complex, so the crashing line is likely not where the issue originates. And we don't have any Sparc machines here, so we have a hard time tracking this down without someone helping to pinpoint where things went wrong.
Which is why I'm trying to help. The SPARC machine I have is very slow, it takes about half a day to build one VNC package, so bisecting many different versions and building them could take days and I don't have the capacity for this. The backtrace points exactly where the alignment issue occurred. I don't use C++ much, hence after looking at it quickly, it is not obvious to me why data access in this template is misaligned. It may be a better approach for developers to review the code around this line and suggest possible fixes. I'm happy to test them and report.
The problem originates on line 1075
buffer = (const T*)pb->getBuffer(r, &stride);
where getBuffer() returns a pointer, which is then cast to pointer to object T. The pointer returned is not aligned properly for object T. This looks like bad C++ programming.
I added some debug statements
template<class T>
inline bool EncodeManager::checkSolidTile(const Rect& r,
const T colourValue,
const PixelBuffer *pb)
{
int w, h;
const T* buffer;
int stride, pad;
w = r.width();
h = r.height();
buffer = (const T*)pb->getBuffer(r, &stride);
vlog.error("buffer=%p", buffer);
pad = stride - w;
while (h--) {
int w_ = w;
while (w_--) {
vlog.error("buffer=%p, h=%d, w=%d", buffer, h, w_);
if (*buffer != colourValue)
return false;
buffer++;
vlog.error("buffer++=%p", buffer);
}
buffer += pad;
}
return true;
}
Wed Oct 30 09:12:18 2024
VNCSConnST: Server default pixel format depth 24 (32bpp) big-endian rgb888
VNCSConnST: Client pixel format depth 24 (32bpp) little-endian rgb888
EncodeManager: buffer=0x46e00400
EncodeManager: buffer=0x46e00400, h=15, w=15
EncodeManager: buffer=0x46e00440
EncodeManager: buffer=0x46e00440, h=15, w=15
EncodeManager: buffer=0x46e00480
...
EncodeManager: buffer=0x47259080, h=15, w=15
EncodeManager: buffer=0x472590c0
EncodeManager: buffer=0x472590c0, h=15, w=8
EncodeManager: buffer=0x47259124
EncodeManager: buffer=0x47259124, h=15, w=15
(EE) Bus error at address 0x47259124
(EE)
Fatal server error:
(EE) Caught signal 10 (Bus error). Server aborting
(EE)
X connection to :1 broken (explicit kill or server shutdown).
Killing Xvnc process ID 7712
Xvnc process ID 7712 already killed
It seems the difference between successive buffer pointer addresses is 64 bytes, the last one which causes SIGBUS is 0x47259124 - 0x472590c0 = 100 bytes. Either way, looks like the buffer pointer may be pointing at a bogus address.
0x47259124 seems aligned well enough for 32-bit pixels. Any idea why it is still giving us SIGBUS?
Because pointers on sparc64 must be aligned on 8 byte boundaries
gdb -p $(pgrep Xvnc)
...
Program received signal SIGBUS, Bus error.
...
(gdb) print buffer
$9 = (unsigned char * const *) 0x47259124
(gdb) print sizeof(buffer)
$10 = 8
(gdb) print 0x47259124 % 8
$11 = 4
So
if (*buffer != colourValue)
is trying to dereference a pointer which is only aligned on 4 byte boundary. Hence this code is broken in the functions which allocate and return the address for that buffer. That pointer must be aligned on 8 byte boundary, i.e. buffer % 8 must be 0.
That is something I don't think we can easily support. The graphics stack expects packed pixels. And we need to be able to access individual pixels.
And are you sure about that requirement? Every hit I get states that sparc64 has the normal alignment requirements of alignment matching the type size.
I got my previous gdb statement a bit wrong. The buffer variable seems to be a pointer to pointer, this is where I think the problem is
(gdb) p colourValue
$14 = (unsigned char * const) 0xffffffffffffc72c ""
(gdb) p buffer
$15 = (unsigned char * const *) 0x47259124
(gdb) p *buffer
$16 = (unsigned char * const) 0x45444f0045444f <error: Cannot access memory at address 0x45444f0045444f>
(gdb) p 0x45444f0045444f % 8
$17 = 7
So *buffer
results in a data type unsigned char *
which is a pointer and must be aligned on 8 bytes. The line *buffer != colourValue
seems to compare two pointers, but *buffer
results misaligned pointer load, causing SIGBUS. On sparc64 pointers are 8 bytes hence the alignment issue.
You have the same issue on x86, but instead of SIGBUS, CPU will instead use misaligned load, which could be less efficient, i.e. instead of one load, it may issue two loads.
This is a regression since 1.13.1 below is a backtrace