Closed manongjohn closed 3 years ago
Hi @manongjohn, we need some time for initial investigation, we will get back with the results once we have something
@manongjohn are you sure ccache is not the problem? I see you keep intermediate objects between the builds and they are definitely not compatible across the compiler tools. Can you please remove - uses: actions/cache@v2
step and check the result binaries?
I don't build that particular library (libmypaint) using ccache. The scripts to configure and make libmypaint don't support using ccache very well so I don't even force using like I have to do with other build scripts in the same run.
Oh. If should also mention, even before i introduced ccache, i was having the same issue.
@manongjohn can you please clarify which binary segfaults ?
Tahoma2D.AppImage
@manongjohn i reviewed compiler options for both "gcc only" and "gcc+clang" builds and they are very same. Moreover i was not able to reproduce segfault by running the Tahoma2D.AppImage in the pipeline https://github.com/dsame/tahoma2d/runs/2246340007?check_suite_focus=true
In order to idntify the reason of the segfault can you please send us a postmortem core dump of the failed binary?
the steps to get it:
sudo /bin/sh -c 'echo "core" > /proc/sys/kernel/core_pattern'
ulimit -c unlimited
Running the app and having it seagfaulted prduces a file named core
in the current directory
In order to get traceback of the segfault you might need to install gdb
sudo apt-get install gdb
And the following command will print the traceback we need to find out the reason of the crash:
echo 'bt full'|gdb path_to_the_binary -c core
Here is a zip containing the coredump and a log of the command/results of the gdb backtrace.
Since your command didn't provide much output, I executed the app through gdb to get the backtrace
I confirm the builds uses the very same make files and options, but i noticed there are different set of dependencies install on the very first step. This can cause the the different build log outputs. I am going to dig into the differences i noticed, but meanwhile is it possible to reproduce the segfault withing the workflow? I tried to launch the app with xvfb x11 server but it just wait for some user input till the timeout. Does it mean the produced binary is not assumed to segfault at all?
Thank you for continuing to look further into this.
I've never tried to have it start in the workflow.
Normally the bad version will segfault within a few seconds after I try to start it. The splash screen doesn't even come up. The good version will show a splash screen then load the application and provide a dialog box prompting the user for input.
Could you share the list of dependencies that are different between the 2 builds? I can see if I have or don't have it in my local build environment and maybe find what may be causing the difference.
@manongjohn these builds of forked repo https://github.com/dsame/tahoma2d/runs/2234450603?check_suite_focus=true https://github.com/dsame/tahoma2d/runs/2235408221?check_suite_focus=true
have different apt logs
gcc + clang:
0 upgraded, 173 newly installed, 0 to remove and 38 not upgraded.
gcc:
0 upgraded, 174 newly installed, 0 to remove and 21 not upgraded.
``
but i hardly believe this relates to the issue.
We definitely have different SSE2 instructions used during the build and this might be caused either some default gcc settings or 3rd party dependencies or different environments the build runs in.
I double checked the libmypaint source and confirmed it has not changed since 2019
My next step is to remove caching just to avoid possible conflicts and try to reduce the build by removing the steps trying to figure out the component that brings the difference in the libmypaint build output
it is confirmed the different build output log does not relate to clang/gcc combination
GCC only build https://github.com/dsame/tahoma2d/runs/2318343127?check_suite_focus=true has no rng-double.c:64:3: note: loop vectorized
and the very same (triggered with git commit --allow-empty ...
) build https://github.com/dsame/tahoma2d/runs/2318341236?check_suite_focus=true - only has rng-double.c:64:3: note: loop vectorized
@madhurig
running the build with ccache removed (not the cache step, but ccache call with cc/cxx) 4 times i got no one rng-double.c:64:3: note: loop vectorized
https://github.com/dsame/tahoma2d/runs/2320577994?check_suite_focus=true https://github.com/dsame/tahoma2d/runs/2320577535?check_suite_focus=true https://github.com/dsame/tahoma2d/runs/2320576850?check_suite_focus=true https://github.com/dsame/tahoma2d/runs/2320574952?check_suite_focus=true
I can suggest the problem is the different steps (inside the single job) use the same cache and there're 3rd party dependencies that brings the different headers from the different mirrors. Sounds crazy but i see nothing could be changed in the build.
Can you please either use the step-dedicated caches or remove a cache from some step? I am not expert in ccache and not able to do this effectively/least performance impact.
From now my only solution is to remove ccache and hope the problem will not arise.
Not sure what might have changed, but for whatever reason, I am suddenly unable to duplicate this problem when I try to recreate the issue. Because of this, there is no point continuing any more investigation so I will close the issue.
Thanks all that took the time to investigate.
Description
I'm getting different results with "gcc" compiled Linux artifacts generated by Actions that are running the same steps outlined in this script but either running just "gcc" alone or running "gcc + clang" configurations:
https://github.com/tahoma2d/tahoma2d/blob/master/.github/workflows/linux_build.yml,
Up front, my question is: Is there something different about the environment that is not written to the logs when I run "gcc+clang" configurations at the same time vs when I run "gcc" by itself?
Area for Triage:
Artifacts
Question, Bug, or Feature?:
Question/Bug
Virtual environments affected
Image version 20210317.1
Expected behavior
Artifacts from a "gcc" only run to work like the "gcc" artifact from a "gcc+clang" run
Actual behavior
When I run the above script with both "gcc + clang" configurations enabled, the gcc artifact works fine. If I run the same script but only run "gcc" configuration, the gcc artifact segfaults when I start it.
I would expect the gcc artifiacts to work in either case since the only difference in the script is if clang is also building at the same time or not.
So I compared the "gcc" output from both runs to see what might be different:
"gcc + clang" generating good gcc artifiact: https://github.com/tahoma2d/tahoma2d/runs/2172015838?check_suite_focus=true "gcc only" generating bad gcc artifact: https://github.com/tahoma2d/tahoma2d/runs/2171845696?check_suite_focus=true
I compared them and for the most party, they are extremely similar with the exception of this from log # 2 which I think is causing the segfault
I do locally build this in my own linux environment, using the same scripts/steps and using gcc. I don't see this message when compiling and my build doesn't segfault
Side note: The "clang" artifact gives me the same issue. It is failing in the same place as the fully gcc compiled artifact. The clang log, again, looks very similar to the gcc log with the noted warning above.
Any insight into environmental differences you can provide would be appreciated.