emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.81k stars 3.31k forks source link

wchar_t sting literals compiled with -fshort-wchar generate alignmentfault errors #18618

Open yosmo78 opened 1 year ago

yosmo78 commented 1 year ago

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.30 (cfe2bdfe2692457cb5f5770672f6e5ccb3ffc2f2) clang version 16.0.0 (https://github.com/llvm/llvm-project 800f0f1546b2352ba42a4777149afb13cb874fcd) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: C:\emsdk\upstream\bin

Failing command line in full:

Here is an extremely minimal version of the code. This contains the error in it:

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

const wchar_t *wszValue = L"";
const wchar_t *wszText = L"text";
const wchar_t *wszVal = L"Val";

class Test
{
public:
    const wchar_t *m_wszVal = L"";
    const wchar_t *m_wszTest = L"Test";
    const wchar_t *m_wszTest2 = L"Test2";
};

Test test;

int main()
{
    const wchar_t *str = L"str2";
    const wchar_t *str2 = L"";
    const wchar_t *str3 = L"str";

    wprintf(L"%u %u\n",sizeof(wchar_t),sizeof(empty));
    wprintf(L"%u %u %u\n",wcslen(str),wcslen(str2),wcslen(str3));
    wprintf(L"%u %u %u\n",wcslen(wszValue),wcslen(wszText),wcslen(wszVal));
    wprintf(L"%u %u %u\n",wcslen(test.m_wszVal),wcslen(test.m_wszTest),wcslen(test.m_wszTest2));
    wprintf(L"%S %S %S\n",str,str2,str3);
    wprintf(L"%S %S %S\n",wszValue,wszText,wszVal);
    wprintf(L"%S %S %S\n",test.m_wszVal,test.m_wszTest,test.m_wszTest2);

    return EXIT_SUCCESS;
}

Basically the error I am running into is that the compiler is placing these 2 byte wchar literals at unaligned memory locations. So you cannot read from them properly without crashing from alignment faults.

For completeness here is a index.html

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
    <canvas id="canvas" oncontextmenu="event.preventDefault()"></canvas>
    <script type='text/javascript'>
        var canv = document.getElementById('canvas');
        var Module = {
            canvas: canv
        };
    </script>
    <!-- Call the javascript glue code (index.js) as generated by Emscripten -->
    <script src="index.js"></script>
</body>
</html>

A quick and dirty server server.js

var http = require('http')
var url = require('url')
var fs = require('fs')
var path = require('path')
var baseDirectory = __dirname   // or whatever base directory you want

var port = 8080

http.createServer(function (request, response) {
    try {
        var requestUrl = url.parse(request.url)

        // need to use path.normalize so people can't access directories underneath baseDirectory
        var fsPath = baseDirectory+path.normalize(requestUrl.pathname)

        var fileStream = fs.createReadStream(fsPath)
        fileStream.pipe(response)
        fileStream.on('open', function() {
//             response.writeHead(200)
             if ( requestUrl.pathname.endsWith('.wasm') )
                response.writeHead(200, { 'Content-type': 'application/wasm', 'Cross-Origin-Embedder-Policy': 'require-corp', 'Cross-Origin-Opener-Policy': 'same-origin' } )
             else if ( requestUrl.pathname.endsWith('.js') )
                response.writeHead(200, { 'Content-type': 'text/javascript', 'Cross-Origin-Embedder-Policy': 'require-corp', 'Cross-Origin-Opener-Policy': 'same-origin' } )
             else
                response.writeHead(200, { 'Cross-Origin-Embedder-Policy': 'require-corp', 'Cross-Origin-Opener-Policy': 'same-origin' } )
        })
        fileStream.on('error',function(e) {
             response.writeHead(404)     // assume the file doesn't exist
             response.end()
        })
   } catch(e) {
        response.writeHead(500)
        response.end()     // end the response so browsers don't hang
        console.log(e.stack)
   }
}).listen(port)

and a command to run the server node server.js and then connect to http://127.0.0.1:8080/index.html in the browser

sbc100 commented 1 year ago

The problem is likely the standard library (and thus wprintf, etc) is not compiled with -fshort-wchar .. which means its still expecting wide wchars.

I think if you want to make -fshort-wchar work you would also need to rebuild any system libraries that deal with wchar.

sbc100 commented 1 year ago

Indeed it seems this is true of -fshort-wchar in general, not specific to emscripten: https://stackoverflow.com/a/15287634/2770641

yosmo78 commented 1 year ago

@sbc100 is there any resources you know of for recompiling the emscripten standard library?

sbc100 commented 1 year ago

Its not something that is easy to do I'm afraid. Can you explain why you want to go to all the effort?

If you do want to go to all the effort your best bet is probably to modify emcc.py such that -fshort-wchar is one of the default compile flags (see get_cflags in emcc.py) then run emcc --clear-cache to remove all the existing libraries.. then just rebuild your program and emscripten should automatically rebuild all the system libraries.

yosmo78 commented 1 year ago

@sbc100 the goal was to be compatible with windows wchar_t (as that is what we are porting lots of stuff from, i.e. a massive amount of code and data (mainly the data is the important bit and the interaction between our wasm system and our native windows code)). So that was the primary motivation for making it unified with our preexisting codebase.

sbc100 commented 1 year ago

OK, let me know of patching -fshort-wchar into get_cflags works for you?