cnjinhao / nana

a modern C++ GUI library
https://nana.acemind.cn
Boost Software License 1.0
2.34k stars 335 forks source link

crash on Windows upon startup #445

Open Xeverous opened 5 years ago

Xeverous commented 5 years ago

Hello, I'm trying to run any of the examples but all crash on startup.

Minimal code:

#include <nana/gui/wvl.hpp>
#include <nana/gui/widgets/label.hpp>
int main()
{
    using namespace nana;
    form    fm;
    label   lb(fm, rectangle(fm.size()));
    lb.caption("Hello, World");
    fm.show();
    exec();
}

GDB backtrace:

#0  0x00007ffd430fe912 in ntdll!RtlCallEnclaveReturn ()
   from C:\Windows\SYSTEM32\ntdll.dll
#1  0x00007ffd4306bbce in ntdll!RtlRaiseException ()
   from C:\Windows\SYSTEM32\ntdll.dll
#2  0x00007ffd3f9aa388 in RaiseException ()
   from C:\Windows\System32\KernelBase.dll
#3  0x000000006144d711 in ?? ()
   from [...] libgcc_s_seh-1.dll
#4  0x000000006fd070c5 in ?? ()
   from [...] libstdc++-6.dll
#5  0x000000006fcfb1d7 in ?? ()
   from [...] libstdc++-6.dll
#6  0x0000000000405694 in void std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_M_construct<wchar_t const*>(wchar_t const*, wchar_t const*, std::forward_iterator_tag) [clone .constprop.292] ()
#7  0x000000000040a512 in nana::to_utf8[abi:cxx11](std::basic_string_view<wchar_t, std::char_traits<wchar_t> >) ()
#8  0x000000000040a981 in nana::font_factory(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, nana::detail::font_style const&, std::filesystem::__cxx11::path) ()
#9  0x0000000000110bfa in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
qPCR4vir commented 5 years ago

exactly what nana version or commit are you using? How are you compiling?

Xeverous commented 5 years ago

Just checked and pulled newest code, it works on 742cde9779d74e4bdd9bff293b0004129d50f700 but was crashing on ffa170b5f560dbf899030bdeb0e539aa95b03649. Now, after full rebuild of nana I can't reproduce it.

Xeverous commented 5 years ago

Apparently it was an invalid build bug or mixed code from multiple builds. I checked dlls that are loaded by the binary and everything is ok (previously I had a problem of executables loading same-name-but-incompatible dlls from other programs).

Xeverous commented 5 years ago

Ok, I have found that the crashing is related to build flags, notably link-time optimization.

When I add set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE) (which means -flto -fno-fat-lto-objects for GCC), the program crashes upon startup.

This might be the cause, although I'm not sure if these warnings should not be printed - I checked your code and you seem to use Windows COM correctly. All warnings are related to drag-and-drop feature, which is irrelevant for the example.

G:\Files\C++\repos\nana\source\gui\dragdrop.cpp: In member function 'bool nana::dragdrop_service::dragdrop(nana::window, nana::dragdrop_service::dropdata_type*, nana::dnd_action*)':
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:717:11: warning: deleting object of polymorphic class type 'nana::drop_source' which has non-virtual destructor might cause undefined behavior [-Wdelete-non-virtual-dtor]
    delete drop_src;
           ^~~~~~~~
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:713:9: warning: unused variable 'status' [-Wunused-variable]
    auto status = ::DoDragDrop(dropdata, drop_src, DROPEFFECT_COPY, &result_effect);
         ^~~~~~
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp: In instantiation of 'ULONG nana::win32com_iunknown<Interface, iid>::Release() [with Interface = IDataObject; const IID& iid = IID_IDataObject; ULONG = long unsigned int]':
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:181:24:   required from here
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:184:19: warning: deleting object of abstract class type 'nana::win32com_iunknown<IDataObject, IID_IDataObject>' which has non-virtual destructor will cause undefined behavior [-Wdelete-non-virtual-dtor]
    if (cRef == 0) delete this;
                   ^~~~~~
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp: In instantiation of 'ULONG nana::win32com_iunknown<Interface, iid>::Release() [with Interface = IDropSource; const IID& iid = IID_IDropSource; ULONG = long unsigned int]':
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:181:24:   required from here
G:\Files\C++\repos\nana\source\gui\dragdrop.cpp:184:19: warning: deleting object of abstract class type 'nana::win32com_iunknown<IDropSource, IID_IDropSource>' which has non-virtual destructor will cause undefined behavior [-Wdelete-non-virtual-dtor]

This time the stack trace is the same - something related to font factory or UTF-8 convertion on Windows.

Xeverous commented 5 years ago

I suspect that there is some bug in font or UTF convertion code that is only visible when applied some optimizations. Will debug and report if I find anything.

cnjinhao commented 5 years ago

Thanks for your investigation. I will check this issue.

Xeverous commented 5 years ago

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L322-L346

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L341

lfFaceName is of type CHAR[32] and this is assigned to wfont_family. Is this intended? You assign a char-string-pointer to a wchar_t-string object. Turns out that LOGFONT is either LOGFONTA or LOGFONTW depending on the unicode settings.

In my case it's LOGFONTW which means that lfFaceName is of type WCHAR[32].

Xeverous commented 5 years ago

metrics.lfMessageFont.lfFaceName (type WHAR[32]) when interepreted as a sequence of 64 bytes (sizeof(wchar_t) == 2), contains:

83 0 101 0 103 0 111 0 101 0 32 0 85 0 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Despite this being wchar_t, it is not an UTF-16 string.

If we read each wchar_t character as a 2-byte integer and display these 32 ints we get:

83 101 103 111 101 32 85 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Which is an ASCII string "Segoe UI".

I wonder if this is expected behaviour. What is sure is that this is not an UTF-16 string - it's an ASCII text where each pair of 2 bytes (wchar_t) is (letter_value (char), 0 (char)). Unless I misunderstand how Windows stores UTF-16 text.

I'm not surprised that the next line (font_family = to_utf8(wfont_family);) crashes the program because it wants to perform UTF-16 to UTF-8 convertion.

Xeverous commented 5 years ago

to_utf8 is:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/deploy.cpp#L192-L195

which calls:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/charset.cpp#L1395-L1398

which calls:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/charset.cpp#L1273-L1279

The text is cast to const char* but the sequence as bytes (83 0 101 0 103 0 111 0 101 0 32 0 85 0 73 0) is not a UTF-16 string. detail::utf16_to_utf8 is probably crashing but even if not, the bug lies in the wrong treating of the string.

qPCR4vir commented 5 years ago

What is sure is that this is not an UTF-16 string - it's an ASCII text where each pair of 2 bytes (wchar_t) is (letter_value (char), 0 (char)). Unless I misunderstand how Windows stores UTF-16 text.

Windows ? why? how encoded UTF-16 ASCII text?

Xeverous commented 5 years ago

Segoe UI string:


I haven't found any direct bug (yet) but:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/charset.cpp#L920-L924

The function takes std::string by const reference but treats it as a sequence of bytes - it would be better to take std::string_view or const char* first, const char* last because all that it needs is to iterate over characters - by requiring std::string you create unnecessary allocations (and other code that calls this function has to create temporary strings).

Also bool le_or_be is not really a good name - it is not clear which true/false is which endianess. Something like is_little_endian or is_big_endian would be more clear.


Regarding strings - apparently I was wrong. The string from lfFaceName is a correct wchar_t UTF-16 string.

I will continue the debug, it still crashes for me.

Xeverous commented 5 years ago

Another repro, more deterministic crash this way:

#include <nana/paint/graphics.hpp>
int main()
{
    using namespace nana;
    paint::font f;
}

output:

terminate called without an active exception

Looks like an exception bug, eg throw (empty throw) without an active exception.

cnjinhao commented 5 years ago

Hi, @Xeverous I can't reproduce the crash. I use MSYS32(GCC 8.3), mybe the linker doesn't support LTO.

Xeverous commented 5 years ago

What are your build flags? (-flto -fno-fat-lto-objects must be given both to the compiler and the linger)

Check that the linker is actually performing LTO - in my case I had to -DCMAKE_AR=gcc-ar because the default ar did not support LTO and promted "plugin needed to handle LTO object".

cnjinhao commented 5 years ago

I still can't reproduce the crash.

g++.exe -Wall -g -std=c++17 -flto -fno-fat-lto-objects -static -Wall -fmax-errors=3 -Wextra -Wpedantic -ffast-math -I..\..\include -I..\..\extrlib\mingw -c D:\nana-dev\source\gui\widgets\skeletons\text_editor.cpp -o obj\Debug\source\gui\widgets\skeletons\text_editor.o
//...other cpps compiled in same way.

//gcc-ar create libnana.a
gcc-ar.exe -r -s libnana.a obj\Debug\source\any.o obj\Debug\source\audio\detail\audio_device.o obj\Debug\source\audio\detail\audio_stream.o obj\Debug\source\audio\detail\buffer_preparation.o obj\Debug\source\audio\player.o obj\Debug\source\basic_types.o obj\Debug\source\charset.o obj\Debug\source\datetime.o obj\Debug\source\deploy.o obj\Debug\source\detail\platform_abstraction.o obj\Debug\source\detail\platform_spec_posix.o obj\Debug\source\detail\platform_spec_windows.o obj\Debug\source\filesystem\filesystem.o obj\Debug\source\gui\animation.o obj\Debug\source\gui\basis.o obj\Debug\source\gui\detail\basic_window.o obj\Debug\source\gui\detail\bedrock_pi.o obj\Debug\source\gui\detail\bedrock_posix.o obj\Debug\source\gui\detail\bedrock_windows.o obj\Debug\source\gui\detail\color_schemes.o obj\Debug\source\gui\detail\drawer.o obj\Debug\source\gui\detail\element_store.o obj\Debug\source\gui\detail\events_operation.o obj\Debug\source\gui\detail\native_window_interface.o obj\Debug\source\gui\detail\window_layout.o obj\Debug\source\gui\detail\window_manager.o obj\Debug\source\gui\dragdrop.o obj\Debug\source\gui\dragger.o obj\Debug\source\gui\drawing.o obj\Debug\source\gui\effects.o obj\Debug\source\gui\element.o obj\Debug\source\gui\filebox.o obj\Debug\source\gui\layout_utility.o obj\Debug\source\gui\msgbox.o obj\Debug\source\gui\notifier.o obj\Debug\source\gui\place.o obj\Debug\source\gui\programming_interface.o obj\Debug\source\gui\screen.o obj\Debug\source\gui\state_cursor.o obj\Debug\source\gui\timer.o obj\Debug\source\gui\tooltip.o obj\Debug\source\gui\widgets\button.o obj\Debug\source\gui\widgets\categorize.o obj\Debug\source\gui\widgets\checkbox.o obj\Debug\source\gui\widgets\combox.o obj\Debug\source\gui\widgets\date_chooser.o obj\Debug\source\gui\widgets\float_listbox.o obj\Debug\source\gui\widgets\form.o obj\Debug\source\gui\widgets\group.o obj\Debug\source\gui\widgets\label.o obj\Debug\source\gui\widgets\listbox.o obj\Debug\source\gui\widgets\menu.o obj\Debug\source\gui\widgets\menubar.o obj\Debug\source\gui\widgets\panel.o obj\Debug\source\gui\widgets\picture.o obj\Debug\source\gui\widgets\progress.o obj\Debug\source\gui\widgets\scroll.o obj\Debug\source\gui\widgets\skeletons\content_view.o obj\Debug\source\gui\widgets\skeletons\text_editor.o obj\Debug\source\gui\widgets\slider.o obj\Debug\source\gui\widgets\spinbox.o obj\Debug\source\gui\widgets\tabbar.o obj\Debug\source\gui\widgets\textbox.o obj\Debug\source\gui\widgets\toolbar.o obj\Debug\source\gui\widgets\treebox.o obj\Debug\source\gui\widgets\widget.o obj\Debug\source\gui\wvl.o obj\Debug\source\internationalization.o obj\Debug\source\paint\detail\image_process_provider.o obj\Debug\source\paint\detail\native_paint_interface.o obj\Debug\source\paint\graphics.o obj\Debug\source\paint\image.o obj\Debug\source\paint\image_process_selector.o obj\Debug\source\paint\pixel_buffer.o obj\Debug\source\paint\text_renderer.o obj\Debug\source\stdc++.o obj\Debug\source\system\dataexch.o obj\Debug\source\system\platform.o obj\Debug\source\system\shared_wrapper.o obj\Debug\source\system\timepiece.o obj\Debug\source\threads\pool.o obj\Debug\source\unicode_bidi.o

Compile the test program

g++.exe -Wall -fexceptions -g -std=c++17 -O3 -DNDEBUG -flto -fno-fat-lto-objects -Wl,--out-implib, -Wl,--major-image-version,0,--minor-image-version,0 -ID:\nana-dev\include -c D:\codeblocks-projects\console\main.cpp -o obj\Debug\main.o
g++.exe  -o bin\Debug\console.exe obj\Debug\main.o  -static-libstdc++ -static-libgcc -static -flto -fno-fat-lto-objects  D:\codeblocks-projects\console\libnana.a -lgdi32 -lcomdlg32 -lstdc++fs
Xeverous commented 5 years ago

I don't use static standard libraries.

cnjinhao commented 5 years ago

With static linkage, I reproduced the crash. After debugging, I found an internal object is not constructed.

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/gui/detail/bedrock_windows.cpp#L165-L169

Xeverous commented 5 years ago

Great, at least you have a repro. I had very unstable undefined behaviour which was causing different stack traces and different memory corruptions on different machines.

Any guess why this object was not initialized? Is this static initialization order fiasco?

cnjinhao commented 5 years ago

Is this static initialization order fiasco?

No, it isn't. The really problem is that bedrock's constructor never gets called.

#include <nana/paint/graphics.hpp>
int main()
{
    using namespace nana;
    paint::font f;
}

When constructing the local variable f, the bedrock's static member hasn't been initialized.

qPCR4vir commented 5 years ago

Nana need to "register" some "objects" both internally and in the OS facilities - there is no way to avoid this: we need to live with some (minimal ?) number of static or global object to do that. This is always a problem. The normal program flow have been extensively tested, and works OK, But @Xeverous tests now unexpected programs flows, and "problems" may appear.
Just thinking: what if you search for all the global objects created in the implementation and create one more global object to rule it all? I mean: create a static nana::gui_state of a class with a member of each of the classes with global objects. The idea is that you could exactly control the order of construction and destruction... well and also is an opportunity to just think about the problem of dependencies. Anyway, all this is an implementation "detail"... you may decide how is best.

Xeverous commented 5 years ago

Just thinking: what if you search for all the global objects created in the implementation and create one more global object to rule it all?

This is a good idea because after such change, there is only 1 global static object in the entire program. All of it's members are non-static and initialized as a part of it.

The additional benefit of this approach over code like this:

type& type::instance()
{
    static type obj;
    return obj;
}

is that the approach "one static object to rule them all" will detect circular dependencies at compile time rather than create risks of static initialization order fiasco.

and also is an opportunity to just think about the problem of dependencies.

I would browse the source and "draw" a graph of dependencies between all static objects. If there is a circular dependency, it will need to be resolved first. The dependency graph will also help forming correct order of members in "the 1 static object to rule them all".

cnjinhao commented 5 years ago

The issue is caused by the side-effect of LTO. C++ defines clearly that static class members are initialized before main(), but it seems that LTO incorrectly eliminates the initialization of the static class member.

Xeverous commented 5 years ago

I doubt that LTO would eliminate initialization of static class members. If so happens, this is a compiler bug. I will check dependencies between all static objects and:

qPCR4vir commented 5 years ago

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89183 https://bugs.launchpad.net/gcc-arm-embedded/+bug/1814397 ?

Xeverous commented 5 years ago

This is not this issue. The bugs you linked end in linker errors (build fails) whereas in the case with nana the build succeeds and the executable crashes.

Xeverous commented 5 years ago

The issue is caused by the side-effect of LTO. C++ defines clearly that static class members are initialized before main(), but it seems that LTO incorrectly eliminates the initialization of the static class member.

Static class members are indeed guuaranteed to be initialized before main() but LTO is allowed to remove unreferenced code. If a static object is not used directly, LTO will remove it.

Example:

// init.cpp
extern int x;

struct initializer
{
    initializer() { x = 5; }
};

static initializer init;
// main.cpp
#include <iostream>
int x = 1;

int main()
{
    std::cout << "x = " << x << "\n";
}

This example may or may not print 5. init is not used anywhere so LTO is allowed to remove the object. Any code that relies on init side effects is not guuaranteed to work.


The following code does just that mistake:

#include <nana/paint/graphics.hpp>
int main()
{
    using namespace nana;
    paint::font f;
}

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/paint/graphics.cpp#L73-L77

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L287-L289

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L231-L237

data::storage is null here. No code has called this function:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L239-L243

This function is only called in respective platform_spec_ static object constructors. Here is the one for Windows:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_spec_windows.cpp#L120-L124

platform_spec_windows was removed by LTO in this example, because it was never needed. You could aswell build this example without compiling platform_spec_windows.cpp at all - there would not be any compiler or linker errors.

No code directly used the object so it's constructor was not run. In effect platform_abstraction::initialize(); was not called too, which led to data::storage being null. This is a library code bug - reliance on side effects of constructors of static objects that are never used.

How to fix:

My proposition is to remove pointer indirection and use Meyer's singleton to force code dependency. Missing file will cause a compilation error and LTO will not be able to remove the object because it will be referenced in the function.

Essentially I propose to remove platform_abstraction::initialize() and change this:

https://github.com/cnjinhao/nana/blob/ffa170b5f560dbf899030bdeb0e539aa95b03649/source/detail/platform_abstraction.cpp#L231-L237

to this:

    static platform_runtime& platform_storage()
    {
        static platform_runtime runtime;
        return runtime;
    }

@cnjinhao thanks for the info about missing static object initialization. This helped me track down the exact source of the problem.

Now I will check if the code example with font and label crashes for the same reason or due to something else.

cnjinhao commented 5 years ago

Simply, LTO is not friendly for C++

Xeverous commented 5 years ago

It is. It is very friendly, it can remove binary bloat and make executables faster and tremedously smaller. I would rather say that static is the offender and has a lot of unintuitive rules that can lead to bugs.

You don't even need LTO to reproduce this bug. Just compile nana without platform_spec_windows.cpp and without platform_spec_posix.cpp files. No compiler/linker errors will be given and the same crash will happen.

qPCR4vir commented 5 years ago

This example may or may not print 5. init is not used anywhere so LTO is allowed to remove the object.

I think this will be an error in the LTO, because what the standard said is ""

4.1.1 Abstract machine [intro.abstract] 1 The semantic descriptions in this document define a parameterized nondeterministic abstract machine. This document places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.4

2) “Correct execution” can include undefined behavior, depending on the data being processed; see Clause 3 and 6.8.1. 3) This documentation also defines implementation-defined behavior; see 6.8.1. 4) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this document as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

Xeverous commented 5 years ago

Ok, this will likely be a language-lawyer-type question on SO

6.6.5.1.2 If a variable with static storage duration has initialization or a destructor with side effects, it shall not be eliminated even if it appears to be unused, except that a class object or its copy/move may be eliminated as specified in [class.copy.elision].

Edit: the removal of unused code is also dependent on translation unit so it might really require an in-depth knowledge of the standard and all exceptions to the as-if rule.

Xeverous commented 5 years ago

Summary of my research and https://stackoverflow.com/questions/56628469/

Taking all of this, I'm still for changing the initialization to use Meyer's singleton which will:

I can make the pull request.

cnjinhao commented 5 years ago

This is an compiler issue. Without LTO, I never encounter the issue on various compilers(VC, GCC even on raspberry pi and clang). If we are trying to make workaround for the issue, it is hard to guarantee the workaround work correctly. Btw, using a static object to initializing internal objects is much easy and cheap way. It can avoid resource race issue.

Xeverous commented 5 years ago

This is an compiler issue.

It's not a compiler issue. The compiler is not allowed to remove the global object. But what the linker does is outside the standard. Linker removes the object because it's never used.

If we are trying to make workaround for the issue, it is hard to guarantee the workaround work correctly.

False. There is no workaround - we don't go around any compiler/linker bug. The linker removes the object because it is not used - nothing in the example uses platform spec object (only side effects of it's construction). The workaround is using specific flags like --whole-archive to prevent linker from performing all allowed optimizations.

The solution is to make that symbol used. It's very simple. Changing return *data::storage; to static platform_runtime runtime; return runtime; and changing static platform_spec object to Meyer's singleton will guuarantee that both objects are constructed and that they are constructed in correct order. This change has no impact on implementation behaviour, it only makes the platform_runtime used which prevents linker from removing it. Both are still static objects, cheaply initialized and avoid race issues.

Xeverous commented 5 years ago

Summing up the newest answer on SO:

Compiler is not allowed to remove global object. But it's allowed to defer it's initialization up to the time of first use of a function from the same translation unit. Since no function from that file is ever used, the object is never initialized so it's removed. Therefore, it's a bug in nana to assume that the object exists.

qPCR4vir commented 5 years ago

allowed to defer it's initialization up to the time of first use of a function from the same translation unit

And before main. Or could you add the part of the c++17 std that make it possible to delete globals wirh an initialization with collateral effects You may also add a link to the SO q.

Xeverous commented 5 years ago

You may also add a link to the SO q

The link was posted earlier, here is a direct answer link: https://stackoverflow.com/a/56633997/4818802

Part of the language draft that allows such behaviour:

http://eel.is/c++draft/basic.start.dynamic

[6.8.3.3.4] It is implementation-defined whether the dynamic initialization of a non-local non-inline variable with static storage duration is sequenced before the first statement of main or is deferred. If it is deferred, it strongly happens before any non-initialization odr-use of any non-inline function or non-inline variable defined in the same translation unit as the variable to be initialized.

So the initialization can be delayed, even past main function. What would prevent delaying is non-init odr-use or use of non-inline function/variable from the same translation unit but it does not happen in the case of nana. That's why I propose Mayer's singleton - function returning static type obj will always guuarantee odr-use and symbol dependency for the linker.

Xeverous commented 5 years ago

I have updated and edited the library to correctly represent relationship between global static objects, now getting this:

(gdb) bt
#0  0x000000000049860f in std::type_info::operator==(std::type_info const&) const ()
#1  0x00000000004551c0 in __cxxabiv1::__vmi_class_type_info::__do_dyncast(long long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const ()
#2  0x00000000004eae47 in __dynamic_cast ()
#3  0x00000000004e6665 in bool std::has_facet<std::ctype<char> >(std::locale const&) ()
#4  0x00000000004e1fb4 in std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) ()
#5  0x00000000004e2100 in std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) ()
#6  0x00000000004dfe6b in std::ios_base::Init::Init() ()
#7  0x0000000000431f73 in __static_initialization_and_destruction_0(int, int) [clone .lto_priv.4] ()
#8  0x0000000000431f9d in _GLOBAL__sub_I__ZN4nana6detail18drawable_impl_typeC2Ev ()
#9  0x000000000043990d in global constructors keyed to 65535_0_objects.a_0x8e.19325 ()
#10 0x00000000004432f3 in __do_global_ctors ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:67
#11 0x000000000044333b in __main ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:83
#12 0x000000000040131c in __tmainCRTStartup ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:329
#13 0x00000000004014c9 in mainCRTStartup ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223
Xeverous commented 5 years ago

Is nana doing anything related to Windows input/output API?

This code:

#include <iostream>

int main()
{
    std::cerr << "entered main" << std::endl;
}

Just can not crash, but when linked to nana library I get a crash, inside std::endl:

#0  0x00000000004a51ac in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) ()
#1  0x00000000004aa42e in main ()

The program's output is exactly entered mainSegmentation fault (no endline). When the program is not linked to nana, it does not crash.

Xeverous commented 5 years ago

When the program is not linked to nana, it does not crash.

This was a very important observation.


So I have tried as hard as recompiling everything (all my code, entire nana) with: -O3 -g -fno-omit-frame-pointer -fno-inline -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -fasynchronous-unwind-tables -fstack-clash-protection -fstack-protector-all to force the bug (potentially as an optimization effect) but to reduce any stack corruptions as much as possible.

This is what happened:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004a3b7f in std::type_info::operator==(std::type_info const&) const
(gdb) bt full
#0  0x00000000004a3b7f in std::type_info::operator==(std::type_info const&) const ()
No symbol table info available.
#1  0x00000000004559f8 in __cxxabiv1::__vmi_class_type_info::__do_dyncast(long long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const ()
No symbol table info available.
#2  0x00000000004fb599 in __dynamic_cast ()
No symbol table info available.
#3  0x00000000004f6be3 in bool std::has_facet<std::ctype<char> >(std::locale const&) ()
No symbol table info available.
#4  0x00000000004f2975 in std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) ()
No symbol table info available.
#5  0x00000000004f2af0 in std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) ()
No symbol table info available.
#6  0x00000000004f084e in std::ios_base::Init::Init() ()
No symbol table info available.
#7  0x000000000043102b in __static_initialization_and_destruction_0 (
    __initialize_p=1, __priority=65535)
    at C:/mingw64/mingw64/include/c++/9.1.1/iostream:74
No locals.
#8  0x0000000000431055 in _GLOBAL__sub_I__ZN4nana6detail18drawable_impl_typeC2Ev ()
    at G:/Files/C++/workspace_windows/nana/source/detail/platform_spec_windows.cpp:148
No locals.
#9  0x0000000000438a2b in global constructors keyed to 65535_0_objects.a_0x8e.13887 () at C:/mingw64/mingw64/include/c++/9.1.1/ext/new_allocator.h:153
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        unseq = {<No data fields>}
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        __default_lock_policy = (unknown: 2)
        def_root = 0x501c20 <_ZN6__pstl9execution2v1L5unseqE.lto_priv.9+1> "C:"
        def_rootname = 0x501c40 <nana::filesystem_ext::def_rootstr+8> "Local Drive(C:)"
        def_encoding_error_police = {_M_t = {_M_t = {<No data fields>}}}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par_unseq = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        par = {<No data fields>}
        piecewise_construct = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        seq = {<No data fields>}
        def_rootstr = 0x501c30 <nana::filesystem_ext::def_root+8> "C:\\"
        __ioinit = {<No data fields>}
        __ioinit = {<No data fields>}
        __ioinit = {<No data fields>}
        __ioinit = {<No data fields>}
        __ioinit = {<No data fields>}
        def_error_mark = 42
        charmap_0x0000_0x00C0 = "\016\016\016\016\016\016\016\016\016\020\017\020\021\017", '\016' <repeats 14 times>, "\017\017\017\020\021\022\022\n\n\n\022\022\022\022\022\t\f\t\f\f\b\b\b\b\b\b\b\b\b\b\f\022\022\022\022\022\022", '\000' <repeats 26 times>, "\022\022\022\022\022\022", '\000' <repeats 26 times>, "\022\022\022\022\016\016\016\016\016\016\017", '\016' <repeats 26 times>, "\f\022\n\n\n\n\022\022\022\022\000\022\022\016\022\022\n\n\b\b\022\000\022\022\022\b\000\022\022\022\022\022"
        use_throw = false
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
        npos = 18446744073709551615
#10 0x0000000000443433 in __do_global_ctors ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:67
        nptrs = <optimized out>
        i = <optimized out>
#11 0x0000000000443479 in __main ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:83
No locals.
#12 0x0000000000401328 in __tmainCRTStartup ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:329
        lock_free = <optimized out>
        fiberid = <optimized out>
        nested = <optimized out>
        lpszCommandLine = <optimized out>
        StartupInfo = {cb = 0, lpReserved = 0x0, lpDesktop = 0x0,
          lpTitle = 0x0, dwX = 0, dwY = 0, dwXSize = 0, dwYSize = 0,
          dwXCountChars = 0, dwYCountChars = 0, dwFillAttribute = 0,
          dwFlags = 0, wShowWindow = 0, cbReserved2 = 0, lpReserved2 = 0x0,
          hStdInput = 0x0, hStdOutput = 0x0, hStdError = 0x0}
        inDoubleQuote = <optimized out>
#13 0x00000000004014be in mainCRTStartup ()
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223
        ret = 255

You can clearly see that std::ios_base::Init::Init() (C:/mingw64/mingw64/include/c++/9.1.1/iostream:74) is before _GLOBAL__sub_I__ZN4nana6detail18drawable_impl_typeC2Ev (which is nana::detail::drawable_impl_type::drawable_impl_type()). So the compiler correctly initializes standard library objects before nana's objects (and the order of nana objects is right) but somehow std::ios_base::Init::Init() crashes inside it's implementation.

Let's see what it actually is:

(gdb) print __ioinit
$1 = {<No data fields>}
(gdb) whatis __ioinit
type = struct std::Init
(gdb) print sizeof(__ioinit)
$2 = 1

It is empty (???) and sizeof(__ioinit) is reported to be 1. But this is just impossible when you look at the source of the standard library:

    // 27.4.2.1.6  Class ios_base::Init
    // Used to initialize standard streams. In theory, g++ could use
    // -finit-priority to order this stuff correctly without going
    // through these machinations.
    class Init
    {
      friend class ios_base;
    public:
      Init();
      ~Init();

#if __cplusplus >= 201103L
      Init(const Init&) = default;
      Init& operator=(const Init&) = default;
#endif

    private:
      static _Atomic_word   _S_refcount;
      static bool       _S_synced_with_stdio;
    };

The executable has broken standard library, but it does not crash when not linked to nana. So I checked what nana links and this is the cause of the problem:

https://github.com/cnjinhao/nana/blob/38cdf4779456ba697d7da863f7c623e25d30f650/build/cmake/shared_libs.cmake#L43

Removed this line, everything works.

I must say this is a big suprise, because pretty much every C++ library links standard library by dynamically by default. nana is only one of many libraries I use, so I can not link statically to stdlibc++ when at least 1 of other libraries uses libstdc++.

What is weird for me though, that there were no linker problems. Build succeeds.


Note: -static-libgcc -static-libstdc++ is not the only problem, I have applied some edits to global objects too (these were obvious crashes).

cnjinhao commented 5 years ago

There is nothing to do with the stdio and streams. All the problems are caused by linker optimizations

qPCR4vir commented 5 years ago

Hi, @Xeverous, very interesting your investigations, amusing. But I suspect that if you discus your findings with the teams developing the compiler - linker optimizators you will get a better feedback and understanding. Don't forget to hint how we can make nana code better after you found what are the problems you see. You may try to add (and PR back) a cmake option to link dynamically the sdt library. I had too many problems porting programs from one PC to another, and the real, practical solution was to have anything in one portable executable, statically linked: no need to have admin account in every PC to use the soft or complex installation. It was key to avoid the "works in my machine" problem. Advanced and mature users like you can set a few options in cmake with no effort, really, no sarcasm here. The defaults are for the rest of us.

qPCR4vir commented 5 years ago

I would like your comments about: you are making an heroic effort to get out a few kb from the executable, but then link dynamically the libraries. I have seems c++ soft projects distributing >200 MB of 3-part .dll with complex installation, and when statically linked it all come to <10 MB of a portable executable (just copy and use).

Xeverous commented 5 years ago

Hi, @Xeverous, very interesting your investigations, amusing. But I suspect that if you discus your findings with the teams developing the compiler - linker optimizators you will get a better feedback and understanding.

Compiler/linker did nothing wrong here and satisfied all of language requirements. C++ doesn't standarize linking itself, any linkage problems are undefined behaviour in the standard, but linker errors in practice.

Yes, I might ask compiler developers why this problem was not discovered (and build succeeded). Executable had 1 static and 1 dynamic standard library which caused the crash, I think such thing can and should be detected by the linker.

You may try to add (and PR back) a cmake option to link dynamically the sdt library.

This is the default for most projects. I don't see any problem here, can easily make a PR. nana should be able to be build as shared/static with any of shared/static standard libraries.

I had too many problems porting programs from one PC to another, and the real, practical solution was to have anything in one portable executable, statically linked: no need to have admin account in every PC to use the soft or complex installation.

I agree with this. Too many programs (especially on Windows) use incompatible versions of the same libraries. This does not mean you have to use static builds though. You can just put all the libraries in the same directory where the executable and it will load these libraries instead. No need to install anything into the system.

I have seems c++ soft projects distributing >200 MB of 3-part .dll with complex installation, and when statically linked it all come to <10 MB of a portable executable (just copy and use).

Right. That's why I use nana statically - I only use some part of nana. But other libraries (OpenSSL, stdlibc++, my own code as library) are used by multiple executables (command-line executable, GUI executable, unit test executable) so I link them dynamically.

So the ideal solution for me is to link nana statically (only 1 executable uses it) but everything else dynamically.

Xeverous commented 5 years ago

Regarding PR for CMake files:

Some other info for you:

https://github.com/cnjinhao/nana/blob/38cdf4779456ba697d7da863f7c623e25d30f650/build/cmake/shared_libs.cmake#L38

CMAKE_COMPILER_IS_GNUCXX is true for MinGW, you can remove "# AND NOT MINGW??"

https://github.com/cnjinhao/nana/blob/38cdf4779456ba697d7da863f7c623e25d30f650/build/cmake/shared_libs.cmake#L40-L41

This is unneded, GCC and Clang automatically add standard library to every executable. It works the other way: you need to add -nostdlib to not link standard library.

I can make a PR for all of this.

qPCR4vir commented 5 years ago

Please, make NANA_SHARED_STDLIB OFF by def :-)