Closed dingmeng-xue closed 3 years ago
Hey, @dingmeng-xue ! We need some time for investigation and will back with details soon :) Thank you!
@dingmeng-xue have you tried Ubuntu-20 image? Is the issue reproduced there?
No yet. Currently, we only use Ubuntu 18 for Linux build. We still hope to stick to that version.
@dingmeng-xue
I was able to get core dump and investigate it.
It looks the cause of the segfault is some memory block is freed more than once. There're very few chances its origin in the image
The backtrace indicates the exception happens in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
which in turn called from /usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.4/libcoreclr.so
@miketimofeev unless there're some changes in libcrypto.so.1.1
i believe the issue should be investigated by .net team
2021-03-30T05:26:57.9386665Z GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2
2021-03-30T05:26:57.9388245Z Copyright (C) 2018 Free Software Foundation, Inc.
2021-03-30T05:26:57.9391342Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2021-03-30T05:26:57.9392599Z This is free software: you are free to change and redistribute it.
2021-03-30T05:26:57.9393671Z There is NO WARRANTY, to the extent permitted by law.
2021-03-30T05:26:57.9394705Z Type "show copying" and "show warranty" for details.
2021-03-30T05:26:57.9396253Z This GDB was configured as "x86_64-linux-gnu".
2021-03-30T05:26:57.9397289Z Type "show configuration" for configuration details.
2021-03-30T05:26:57.9398308Z For bug reporting instructions, please see:
2021-03-30T05:26:57.9399273Z <http://www.gnu.org/software/gdb/bugs/>.
2021-03-30T05:26:57.9400398Z Find the GDB manual and other documentation resources online at:
2021-03-30T05:26:57.9401492Z <http://www.gnu.org/software/gdb/documentation/>.
2021-03-30T05:26:57.9402116Z
2021-03-30T05:26:57.9402888Z For help, type "help".
2021-03-30T05:26:57.9403835Z Type "apropos word" to search for commands related to "word"...
2021-03-30T05:26:57.9412060Z Reading symbols from /usr/bin/dotnet...(no debugging symbols found)...done.
2021-03-30T05:26:57.9691950Z [New LWP 3805]
2021-03-30T05:26:57.9693670Z [New LWP 3795]
2021-03-30T05:26:57.9694545Z [New LWP 3794]
2021-03-30T05:26:57.9696327Z [New LWP 3796]
2021-03-30T05:26:57.9699468Z [New LWP 3800]
2021-03-30T05:26:57.9699989Z [New LWP 3797]
2021-03-30T05:26:57.9700331Z [New LWP 3801]
2021-03-30T05:26:57.9706351Z [New LWP 3804]
2021-03-30T05:26:57.9706844Z [New LWP 3809]
2021-03-30T05:26:57.9707274Z [New LWP 3793]
2021-03-30T05:26:57.9707682Z [New LWP 3798]
2021-03-30T05:26:57.9708105Z [New LWP 3799]
2021-03-30T05:26:57.9708524Z [New LWP 3803]
2021-03-30T05:26:57.9708942Z [New LWP 3806]
2021-03-30T05:26:57.9709342Z [New LWP 3807]
2021-03-30T05:26:57.9709764Z [New LWP 3808]
2021-03-30T05:26:57.9710180Z [New LWP 3810]
2021-03-30T05:26:57.9710596Z [New LWP 3811]
2021-03-30T05:26:57.9745642Z [Thread debugging using libthread_db enabled]
2021-03-30T05:26:57.9747186Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2021-03-30T05:26:58.6823436Z Core was generated by `dotnet sln /home/vsts/work/1/s/artifacts/Azure.PowerShell.sln add /home/vsts/wo'.
2021-03-30T05:26:58.6824685Z Program terminated with signal SIGSEGV, Segmentation fault.
2021-03-30T05:26:58.6825335Z #0 __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x0)
2021-03-30T05:26:58.6825902Z at pthread_rwlock_common.c:576
2021-03-30T05:26:58.6826628Z 576 pthread_rwlock_common.c: No such file or directory.
2021-03-30T05:26:58.6827216Z [Current thread is 1 (Thread 0x7fe479b88700 (LWP 3805))]
2021-03-30T05:26:58.6827821Z (gdb) #0 __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x0)
2021-03-30T05:26:58.6828396Z at pthread_rwlock_common.c:576
2021-03-30T05:26:58.7191887Z may_share_futex_used_flag = <optimized out>
2021-03-30T05:26:58.7192858Z wpf = <optimized out>
2021-03-30T05:26:58.7193483Z ready = <optimized out>
2021-03-30T05:26:58.7194056Z r = <optimized out>
2021-03-30T05:26:58.7194700Z may_share_futex_used_flag = <optimized out>
2021-03-30T05:26:58.7195304Z r = <optimized out>
2021-03-30T05:26:58.7195867Z wpf = <optimized out>
2021-03-30T05:26:58.7196427Z ready = <optimized out>
2021-03-30T05:26:58.7197912Z __value = <optimized out>
2021-03-30T05:26:58.7198542Z prefer_writer = <optimized out>
2021-03-30T05:26:58.7199695Z private = <optimized out>
2021-03-30T05:26:58.7200293Z wf = <optimized out>
2021-03-30T05:26:58.7200858Z err = <optimized out>
2021-03-30T05:26:58.7201985Z w = <optimized out>
2021-03-30T05:26:58.7202579Z w = <optimized out>
2021-03-30T05:26:58.7203129Z private = <optimized out>
2021-03-30T05:26:58.7203701Z err = <optimized out>
2021-03-30T05:26:58.7204776Z w = <optimized out>
2021-03-30T05:26:58.7205368Z wf = <optimized out>
2021-03-30T05:26:58.7205908Z wf = <optimized out>
2021-03-30T05:26:58.7206475Z __value = <optimized out>
2021-03-30T05:26:58.7209120Z #1 __GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
2021-03-30T05:26:58.7209805Z result = <optimized out>
2021-03-30T05:26:58.7210357Z #2 0x00007fe47860e989 in CRYPTO_THREAD_write_lock ()
2021-03-30T05:26:58.7211539Z from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
2021-03-30T05:26:58.7212164Z No symbol table info available.
2021-03-30T05:26:58.7212692Z #3 0x00007fe4785d0013 in RAND_get_rand_method ()
2021-03-30T05:26:58.7213797Z from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
2021-03-30T05:26:58.7214361Z No symbol table info available.
2021-03-30T05:26:58.7214875Z #4 0x00007fe4785d02f0 in RAND_bytes ()
2021-03-30T05:26:58.7215607Z from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
2021-03-30T05:26:58.7216151Z No symbol table info available.
2021-03-30T05:26:58.7216652Z #5 0x00007fe47858d49f in ?? ()
2021-03-30T05:26:58.7217360Z from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
2021-03-30T05:26:58.7217899Z No symbol table info available.
2021-03-30T05:26:58.7218447Z #6 0x00007fe47859ba97 in EVP_CIPHER_CTX_ctrl ()
2021-03-30T05:26:58.7219186Z from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
2021-03-30T05:26:58.7219707Z No symbol table info available.
2021-03-30T05:26:58.7220594Z #7 0x00007fe4789596b4 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
2021-03-30T05:26:58.7221149Z No symbol table info available.
2021-03-30T05:26:58.7221872Z #8 0x00007fe47894b4fa in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
2021-03-30T05:26:58.7223769Z No symbol table info available.
2021-03-30T05:26:58.7224655Z #9 0x00007fe478945f16 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
2021-03-30T05:26:58.7225237Z No symbol table info available.
2021-03-30T05:26:58.7225758Z #10 0x00007fe4789324c4 in SSL_do_handshake ()
2021-03-30T05:26:58.7226493Z from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
2021-03-30T05:26:58.7227029Z No symbol table info available.
2021-03-30T05:26:58.7227525Z #11 0x00007fe49ec91343 in ?? ()
2021-03-30T05:26:58.7228193Z No symbol table info available.
2021-03-30T05:26:58.7228686Z #12 0x00007fe479b86d70 in ?? ()
2021-03-30T05:26:58.7229942Z No symbol table info available.
2021-03-30T05:26:58.7230457Z #13 0x00000000000f98f4 in ?? ()
2021-03-30T05:26:58.7230987Z No symbol table info available.
2021-03-30T05:26:58.7232043Z #14 0x00007fe513e11848 in ?? ()
2021-03-30T05:26:58.7232637Z from /usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.4/libcoreclr.so
2021-03-30T05:26:58.7233216Z No symbol table info available.
2021-03-30T05:26:58.7234204Z #15 0x00007fe479b87c80 in ?? ()
2021-03-30T05:26:58.7234714Z No symbol table info available.
2021-03-30T05:26:58.7235193Z #16 0x00007fe49ed30900 in ?? ()
2021-03-30T05:26:58.7236234Z No symbol table info available.
2021-03-30T05:26:58.7236740Z #17 0x00007fe49ed30900 in ?? ()
2021-03-30T05:26:58.7237246Z No symbol table info available.
2021-03-30T05:26:58.7237724Z #18 0x00007fe479b86d70 in ?? ()
2021-03-30T05:26:58.7242339Z No symbol table info available.
2021-03-30T05:26:58.7245851Z #19 0x00007fe49ec91343 in ?? ()
2021-03-30T05:26:58.7248225Z No symbol table info available.
2021-03-30T05:26:58.7249241Z #20 0x00007fe479b86e00 in ?? ()
2021-03-30T05:26:58.7251192Z No symbol table info available.
2021-03-30T05:26:58.7252669Z #21 0x00007fe49ed309d8 in ?? ()
2021-03-30T05:26:58.7258734Z No symbol table info available.
2021-03-30T05:26:58.7259932Z #22 0x00007fe49ed30900 in ?? ()
2021-03-30T05:26:58.7260595Z No symbol table info available.
2021-03-30T05:26:58.7261218Z #23 0x00007fe47a672ed8 in ?? ()
2021-03-30T05:26:58.7261861Z No symbol table info available.
2021-03-30T05:26:58.7262493Z #24 0x14061d5200000001 in ?? ()
2021-03-30T05:26:58.7263128Z No symbol table info available.
2021-03-30T05:26:58.7263753Z #25 0x0000000000001333 in ?? ()
2021-03-30T05:26:58.7264389Z No symbol table info available.
2021-03-30T05:26:58.7265024Z #26 0x0000000000000000 in ?? ()
2021-03-30T05:26:58.7266045Z No symbol table info available.
2021-03-30T05:26:58.7266640Z (gdb) quit
2021-03-30T05:26:58.7476760Z ##[section]Finishing: Bash
@dingmeng-xue could you please try to use another .net core version? Does it help?
Sure. We plan to try another version of .net core. Since this is a random failure to us, it may take couple days to understand the result.
After we test dotnet 2.1 in one week, there is no the same issue.
@dingmeng-xue could you address the issue to the .net team then?
Closing the issue for now. Please let us know if you have any concerns and it should be reopened after discussion with .NET team
We're experiencing this.
Run yarn eslint . --ext .js,.ts
yarn run v1.22.17
$ eslint . --ext .js,.jsx,.ts,.tsx --fix . --ext .js,.ts
Segmentation fault (core dumped)
error Command failed with exit code 139.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Error: Process completed with exit code 139.
name: PR linter check
# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the dev branch
on:
pull_request:
branches:
- main
- master
- dev
- stg
- uat
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "lint"
lint-check:
# The type of runner that the job will run on
runs-on: ubuntu-20.04
continue-on-error: false
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@master
- uses: actions/cache@v2
with:
path: '**/node_modules'
key: ${{ runner.os }}-node-modules-${{ hashFiles('**/yarn.lock') }}
- name: Install node modules
run: yarn --prefer-offline
- name: Lint
run: yarn eslint . --ext .js,.ts
# skipLibCheck is temporary because it also excludes our own declaration files
# https://github.com/microsoft/TypeScript/issues/40426
- name: TypeScript
run: tsc --noEmit --skipLibCheck
We use this for PR checks and this happens very often, around 8/10 of the time. Either on the Lint job or the TypeScript job.
+1 Also seeing this on our yarn eslint
step. Occurs on ubuntu-20.04
and ubuntu-22.04
.
I used mxschmitt/action-tmate@v3
to ssh into the box (after a segmentation fault has occurred) and manually run yarn eslint
. I run the command multiple times, one after the other, without changing any code. All runs within the first minute or so will fail with a segmentation fault. After that first minute, running yarn eslint
will succeed and does not segfault.
I'm assuming some race condition resolves???
For anyone else that came across this error we had something similar. What it is, is that your action runner is upgraded to node 20 but action runner packages using are on a prior node version. This causing a segmentation fault as packages are incompatible. Once all aligned on same version should fix
Description
We hit segmentation fault issue recently on random no matter that we use the latest source code or old source code. We googled it and most of comments point to system level issue. We need your engagement.
https://dev.azure.com/azure-sdk/public/_build/results?buildId=805984&view=logs&j=2f953adc-c56d-55c4-a64a-eab7c4b02235&t=fc7ea605-a507-5208-bc88-3e6a658c906b
Area for Triage:
No idea
Question, Bug, or Feature?:
Question
Virtual environments affected
Image version
Image version where you are experiencing the issue.
Image: ubuntu-18.04 Current agent version: '2.183.1'
Expected behavior
Build csharp project successfully.
Actual behavior
Failed due to segmentation fault
Repro steps
A description with steps to reproduce the issue. If your have a public example or repo to share, please provide the link. https://dev.azure.com/azure-sdk/public/_build/results?buildId=805984&view=logs&j=2f953adc-c56d-55c4-a64a-eab7c4b02235&t=fc7ea605-a507-5208-bc88-3e6a658c906b