Closed akusmin closed 4 years ago
You are targeting runtime 1.0.1 with this:
"Microsoft.NETCore.App": {
"version": "1.0.1",
"type": "platform"
}
But you have preview2-1 installed, which brings the 1.1.0 shared framework. So, either you need to update your Microsoft.NetCore.App to 1.1.0 or you need to install the 1.0.1 shared framework in the machines where you want to run your site.
@livarcocc yes. is was wrong configuration, but with Microsoft.NetCore.App 1.1.0 same story...
"dependencies": {
"Microsoft.AspNetCore.Mvc": "1.1.0",
"Microsoft.AspNetCore.Routing": "1.1.0",
"Microsoft.AspNetCore.Server.IISIntegration": "1.1.0",
"Microsoft.AspNetCore.Server.IISIntegration.Tools": "1.1.0-preview4-final",
"Microsoft.AspNetCore.Server.Kestrel": "1.1.0",
"Microsoft.Extensions.Configuration.EnvironmentVariables": "1.1.0",
"Microsoft.Extensions.Configuration.FileExtensions": "1.1.0",
"Microsoft.Extensions.Configuration.Json": "1.1.0",
"Microsoft.Extensions.Logging": "1.1.0",
"Microsoft.Extensions.Logging.Console": "1.1.0",
"Microsoft.Extensions.Logging.Debug": "1.1.0",
"Microsoft.Extensions.Options.ConfigurationExtensions": "1.1.0",
"Microsoft.NETCore.App": {
"version": "1.1.0",
"type" : "platform"
}
},
"tools": {
},
"frameworks": {
"netcoreapp1.0": {
}
},
"runtimes": {
"ubuntu.14.04-x64": {}
},
.NET Command Line Tools (1.0.0-preview2-1-003177)
Product Information:
Version: 1.0.0-preview2-1-003177
Commit SHA-1 hash: a2df9c2576
Runtime Environment:
OS Name: ubuntu
OS Version: 14.04
OS Platform: Linux
RID: ubuntu.14.04-x64
is there any way to trace what is going on?
I looked at the 1.1 templates here and found that we're using the netcoreapp1.1
framework. That's the only difference I can discern but you may find others if you compare the PJ files. I'm going to reopen this issue as a discussion until you get into a good state.
@piotrpMSFT , thanks for your feedback. I've tried netcoreapp1.1 also but no result.
"dependencies": {
"Microsoft.AspNetCore.Mvc": "1.1.0",
"Microsoft.AspNetCore.Routing": "1.1.0",
"Microsoft.AspNetCore.Server.IISIntegration": "1.1.0",
"Microsoft.AspNetCore.Server.IISIntegration.Tools": "1.1.0-preview4-final",
"Microsoft.AspNetCore.Server.Kestrel": "1.1.0",
"Microsoft.Extensions.Configuration.EnvironmentVariables": "1.1.0",
"Microsoft.Extensions.Configuration.FileExtensions": "1.1.0",
"Microsoft.Extensions.Configuration.Json": "1.1.0",
"Microsoft.Extensions.Logging": "1.1.0",
"Microsoft.Extensions.Logging.Console": "1.1.0",
"Microsoft.Extensions.Logging.Debug": "1.1.0",
"Microsoft.Extensions.Options.ConfigurationExtensions": "1.1.0",
"Microsoft.NETCore.App": {
"version": "1.1.0",
"type" : "platform"
}
},
"tools": {
},
"frameworks": {
"netcoreapp1.1": {
}
},
"runtimes": {
"ubuntu.14.04-x64": {}
},
Same error: Failed to initialize CoreCLR, HRESULT: 0x80004005
also let me note that i'm using dotnet-install-script command: dotnet-install.sh --version 1.0.0-preview2-1-003177 then export to PATH variable dotnet folder.
previous dotnet installation was removed using script from official manuals: https://github.com/dotnet/cli/blob/rel/1.0.0/scripts/obtain/uninstall/dotnet-uninstall-debian-packages.sh
Regards
@akusmin I must be missing something. You said that your environment in windows, however you are using the debian script to uninstall it?
Can you tell us what OS and architecture you are running?
@livarcocc partially.. I mentioned two environments: Ubuntu and Windows. I'm building sources on Windows and publishing to remote Ubuntu server. Also I checked on a local Ubuntu and everything is great.
I used debian script to uninstall dotnet from remote Ubuntu server.
Remote Ubuntu desc:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
uname -p
x86_64
Regards
@schellap @gkhanna79 can you guys take a look? I think we've exhausted available options for debugging CoreCLR initialization issues...
@piotrpMSFT I found solution described in the issue 1488
Setting System.GC.Server to false allowed to start application. I suppose that my issue with initializing CoreCLR has the same root with described one in the issue above.
Regards
@livarcocc we should really provide a better UX here. Do we have enough state to recommend action to the user? The investigation here was certainly frustrating for @akusmin and others.
This issue also repro'ed in Windows 10 RS2 (WSL) scenarios. 1) "dotnet run" for console application - Works correctly. 2) "dotnet run" for web application - Run into the same issue.
When I try to set "System.GC.Server": false in project.json file, then this issue doesn't repro anymore.
"runtimeOptions": { "configProperties": { "System.GC.Server": false } },
I finally figured out that this is due to the different minimal memory required when run dotnet console application and web application.
For my scenario (WSL), I set 8 GB startup memory and just enabled "Use Dynamic Memory for this virtual machine" when created the VM and the "Minimum RAM"is set 512 MB by default, thus the actual total memory on VM will be only set around 2GB. When I try to set "Minimum RAM" to 3072 MB by default Or do NOT enable the Dynamic Memory (keep the startup memory at least 3 GB), then this issue doesn't repro anymore for web application.
This issue only repro'ed when run dotnet web application, is this expected behavior for web application scenario ? I suppose we should at least give some more better error messages indicate such situation so that might help customers narrow down the root cause ?
I encountered the same issue on a fresh debian 8 (x64) server that I just installed, with 8 GB memory. I used dotnet 1.0.0-rc4-004771.
dotnet new mvc
dotnet restore
dotnet run
=> Failed to initialize CoreCLR, HRESULT: 0x80004005
Setting ServerGarbageCollection
property to false
in the csproj file resolves the issue:
<Project Sdk="Microsoft.NET.Sdk.Web">
<PropertyGroup>
<TargetFramework>netcoreapp1.0</TargetFramework>
<ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.AspNetCore" Version="1.0.3" />
<PackageReference Include="Microsoft.AspNetCore.Mvc" Version="1.0.2" />
<PackageReference Include="Microsoft.AspNetCore.StaticFiles" Version="1.0.1" />
<PackageReference Include="Microsoft.Extensions.Logging.Debug" Version="1.0.1" />
<PackageReference Include="Microsoft.VisualStudio.Web.BrowserLink" Version="1.0.1" />
</ItemGroup>
</Project>
dotnet run
=> Hosting environment: Production
Content root path: [...]
Now listening on: http://localhost:5000
Application started. Press Ctrl+C to shut down.
Same results when I use netcoreapp1.1 (with dependencies in version 1.1.0) instead of netcoreapp1.0.
Note that when I use the dotnet new console
template instead of dotnet new mvc
, the application works correctly in all situations (with ServerGarbageCollection set to either true or false).
CC @brianrob @Maoni0 to help determine how Server GC initialization would affect minimum requirements on memory.
CC @Maoni0
CC @swgillespie. Sean is getting a repro for us to debug.
@sebok How many cores does your machine have?
@swgillespie It has 4 cores.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz
Stepping: 7
CPU MHz: 1600.257
BogoMIPS: 6784.45
Virtualization: VT-x
@sebok Thank you! And to be clear, are you also building on Windows and copying to Debian like @akusmin or are you compiling directly on the target machine?
I compile only on Debian, and I don't modify any file after the dotnet new
command.
@sebok Unfortunately I'm having trouble reproing this - do you mind post/sharing the output of strace dotnet run
?
Here it is: strace-dotnet-run.txt
@sebok Ack, sorry, I guess dotnet run
runs the target process as a child process - do you mind doing it again with dotnet <path to the built .dll>
? Sorry!
Ah, yes. Here is the result of dotnet <dll>
:
strace-dotnet-dll.txt
@swgillespie Were you able to repro? Any idea of what could be the issue? I stay available to help.
I just tried with dotnet version released today: the ASP.NET application sometimes starts successfully and sometimes I still get 0x80004005 error.
$ dotnet --info
.NET Command Line Tools (1.0.1)
Product Information:
Version: 1.0.1
Commit SHA-1 hash: 005db40cd1
Runtime Environment:
OS Name: debian
OS Version: 8
OS Platform: Linux
RID: debian.8-x64
Base Path: /opt/dotnet/sdk/1.0.1
$ dotnet new mvc
Content generation time: 332.792 ms
The template "ASP.NET Core Web App" created successfully.
$ dotnet restore
Restoring packages for /home/me/essai-1.0-mvc/essai-1.0-mvc.csproj...
Generating MSBuild file /home/me/essai-1.0-mvc/obj/essai-1.0-mvc.csproj.nuget.g.props.
Generating MSBuild file /home/me/essai-1.0-mvc/obj/essai-1.0-mvc.csproj.nuget.g.targets.
Writing lock file to disk. Path: /home/me/essai-1.0-mvc/obj/project.assets.json
Restore completed in 1.25 sec for /home/me/essai-1.0-mvc/essai-1.0-mvc.csproj.
NuGet Config files used:
/home/me/.nuget/NuGet/NuGet.Config
Feeds used:
https://api.nuget.org/v3/index.json
/home/me/.nuget/localrepo
$ dotnet run
Hosting environment: Production
Content root path: /home/me/essai-1.0-mvc
Now listening on: http://localhost:5000
Application started. Press Ctrl+C to shut down.
^CApplication is shutting down...
$ dotnet run
Failed to initialize CoreCLR, HRESULT: 0x80004005
$ dotnet run
Hosting environment: Production
Content root path: /home/me/essai-1.0-mvc
Now listening on: http://localhost:5000
Application started. Press Ctrl+C to shut down.
^CApplication is shutting down...
$ dotnet run
Failed to initialize CoreCLR, HRESULT: 0x80004005
$ dotnet run
Failed to initialize CoreCLR, HRESULT: 0x80004005
$ dotnet run
Hosting environment: Production
Content root path: /home/me/essai-1.0-mvc
Now listening on: http://localhost:5000
Application started. Press Ctrl+C to shut down.
^CApplication is shutting down...
@sebok, thank you for offering to help! we are having trouble reproing this issue so I was wondering if you would be interested in doing a bit of debugging with us :) it would be most appreciated. would it be possible that we could chat on gitter (the coreclr channel)?
@Maoni0, I would indeed be interested ;) I'll join you on the channel.
@sebok, thank you!
So @sebok, @Maoni0 and I took a look at this on Gitter and discovered that the point of failure is after the GC, during initialization of the SystemDomain
appdomain. In particular, it fails here:
Breakpoint 1, SystemDomain::Init (this=0x7101f421b980 <g_pSystemDomainMemory>) at /home/ed/coreclr/src/vm/appdomain.cpp:2553
2553 m_BaseLibrary.Append(g_pwBaseLibrary);
(gdb) n
2518 if (CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_ZapDisable) != 0)
(gdb) n
2523 m_pSystemAssembly = NULL;
(gdb) n
2525 DWORD size = 0;
(gdb) n
2530 hr = GetInternalSystemDirectory(NULL, &size);
(gdb) n
2531 if (hr != HRESULT_FROM_WIN32(ERROR_INSUFFICIENT_BUFFER))
(gdb) n
2535 WCHAR *buffer = m_SystemDirectory.OpenUnicodeBuffer(size-1);
(gdb) n
2536 IfFailThrow(GetInternalSystemDirectory(buffer, &size));
(gdb) n
2537 m_SystemDirectory.CloseBuffer();
(gdb) n
2538 m_SystemDirectory.Normalize();
(gdb) n
2548 m_BaseLibrary.Append(m_SystemDirectory);
(gdb) n
2549 if (!m_BaseLibrary.EndsWith(DIRECTORY_SEPARATOR_CHAR_W))
(gdb) n
2553 m_BaseLibrary.Append(g_pwBaseLibrary);
(gdb) n
2554 m_BaseLibrary.Normalize();
(gdb) n
2556 LoadBaseSystemClasses();
(gdb) n
Failed to initialize CoreCLR, HRESULT: 0x80004005
[Thread 0x7101e9874700 (LWP 28530) exited]
[Thread 0x70fd8e1fa700 (LWP 28532) exited]
[Thread 0x7101ea0ac700 (LWP 28529) exited]
[Thread 0x7101ea8ec700 (LWP 28528) exited]
[Thread 0x7101f080b700 (LWP 28527) exited]
[Thread 0x7101ebf8f700 (LWP 28526) exited]
[Thread 0x7101f1057700 (LWP 28525) exited]
[Thread 0x7101f18d8700 (LWP 28524) exited]
[Thread 0x7101f2113700 (LWP 28523) exited]
[Thread 0x7101f59b8740 (LWP 28519) exited]
[Inferior 1 (process 28519) exited with code 0211]
Looking at @sebok's strace, one of the last things that occured before initialization failed was the mapping of System.Private.CoreLib.ni.dll
into the process:
stat("/opt/dotnet/shared/Microsoft.NETCore.App/1.0.3/System.Private.CoreLib.ni.dll", {st_mode=S_IFREG|0644, st_size=9310208, ...}) = 0
open("/opt/dotnet/shared/Microsoft.NETCore.App/1.0.3/System.Private.CoreLib.ni.dll", O_RDONLY) = 11
flock(11, LOCK_SH|LOCK_NB) = 0
fcntl(11, F_SETFD, FD_CLOEXEC) = 0
lseek(11, 0, SEEK_SET) = 0
read(11, "MZ\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 64) = 64
lseek(11, 128, SEEK_SET) = 128
read(11, "PE\0\0\35\375\4\0003R#X\0\0\0\0\0\0\0\0\360\0\" \v\2\v\0\0\0\0\0"..., 264) = 264
mmap(0x180220000, 9310208, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x749f2d103000
mmap(0x749f2d103000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED, 11, 0) = 0x749f2d103000
mmap(0x749f2d104000, 1527808, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 11, 0x1000) = 0x749f2d104000
mmap(0x749f2d279000, 126976, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 11, 0x176000) = 0x749f2d279000
mmap(0x749f2d298000, 7475200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 11, 0x195000) = 0x749f2d298000
mmap(0x749f2d9b9000, 176128, PROT_READ, MAP_PRIVATE|MAP_FIXED, 11, 0x8b6000) = 0x749f2d9b9000
I'm not super familiar with the process of binding mscorlib
but it seems like something is going wrong there that derails startup. We tried to not use the native image of S.P.CL, but the CLI did not allow it (sadly):
(gdb) run bin/Debug/netcoreapp1.1/essai-1.0-mvc.dll
Starting program: /usr/local/bin/dotnet bin/Debug/netcoreapp1.1/essai-1.0-mvc.dll
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Error: assembly specified in the dependencies manifest was not found -- package: 'runtime.debian.8-x64.Microsoft.NETCore.Runtime.CoreCLR', version: '1.1.1', path: 'runtimes/debian.8-x64/native/System.Private.CoreLib.ni.dll'
[Inferior 1 (process 28737) exited with code 0214]
Thank you @sebok for your help in troubleshooting this!
Looking at https://dotnet.myget.org/feed/dotnet-core/package/nuget/runtime.debian.8-x64.Microsoft.NETCore.Runtime.CoreCLR/1.1.1, the assembly is present in the intended location.
Yeah, the assembly is in its intended location (the open
syscall above returns successfully) but apparently something goes awry during the binding of types within S.P.CL.
Which types?
@sebok Do you think you could help us one more time and step through LoadBaseSystemClasses
the way were were above, so we can identify the exact point of failure?
I've determined that the exception happens in this call stack:
- SystemDomain::LoadBaseSystemClasses ==>see breackpoint 1
- AppDomain::LoadDomainAssemblyInternal
- AppDomain::LoadDomainFile ==> see breackpoint 2
- AppDomain::TryIncrementalLoad
- DomainFile::DoIncrementalLoad
- DomainAssembly::Allocate
- Assembly::Create ==> see breackpoint 3
- Assembly::Init
- Module::Create ==> see breackpoint 4
- Module::DoInit
- Module::Initialize
- Module::InitializeNativeImage
- Module::LoadHelperTable ==> see breackpoint 5
- rel32UsingJumpStub
- ExecutionManager::jumpStub
- ExecutionManager::getNextJumpStub
- EEJitManager::allocJumpStubBlock
- EEJitManager::allocCodeRaw
- EEJitManager::NewCodeHeap
- LoaderCodeHeap::CreateCodeHeap ==> see breackpoint 6
Breakpoint locations:
(gdb) b SystemDomain::LoadBaseSystemClasses
Breakpoint 1 at 0x7dfba395bc1f: file /home/ed/coreclr/src/vm/appdomain.cpp, line 2820.
(gdb) b AppDomain::LoadDomainFile
Breakpoint 2 at 0x7dfba39634ea: AppDomain::LoadDomainFile. (2 locations)
(gdb) b Assembly::Create
Breakpoint 3 at 0x7dfba3975f0d: file /home/ed/coreclr/src/vm/assembly.cpp, line 575.
(gdb) b Module::Create
Breakpoint 4 at 0x7dfba397fbfa: file /home/ed/coreclr/src/vm/ceeload.cpp, line 1463.
(gdb) b Module::LoadHelperTable
Breakpoint 5 at 0x7dfba37e7fb1: file /home/ed/coreclr/src/vm/jitinterface.cpp, line 12757.
(gdb) b LoaderCodeHeap::CreateCodeHeap
Breakpoint 6 at 0x7dfba37a96b5: file /home/ed/coreclr/src/vm/codeman.cpp, line 2070.
Execution:
(gdb) run bin/Debug/netcoreapp1.1/essai-1.0-mvc.dll
Starting program: /usr/local/bin/dotnet bin/Debug/netcoreapp1.1/essai-1.0-mvc.dll
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x6a811d64d700 (LWP 12562)]
[New Thread 0x6a811cd53700 (LWP 12563)]
[New Thread 0x6a8117f05700 (LWP 12564)]
[New Thread 0x6a8117651700 (LWP 12565)]
[New Thread 0x6a8116d97700 (LWP 12566)]
[New Thread 0x6a8115b54700 (LWP 12567)]
[New Thread 0x6a811532b700 (LWP 12568)]
[New Thread 0x6a8114b11700 (LWP 12569)]
[New Thread 0x6a7cba20b700 (LWP 12570)]
[New Thread 0x6a7cb9961700 (LWP 12571)]
Breakpoint 1, SystemDomain::LoadBaseSystemClasses (this=0x6a811f6f4980 <g_pSystemDomainMemory>) at /home/ed/coreclr/src/vm/appdomain.cpp:2820
2820 m_pSystemFile = PEAssembly::OpenSystem(NULL);
(gdb) c
Continuing.
Breakpoint 2, AppDomain::LoadDomainFile (this=0x131efa0, pLock=0x131a430, targetLevel=FILE_LOAD_POST_LOADLIBRARY) at /home/ed/coreclr/src/vm/appdomain.cpp:6976
6976 if(!CanLoadCode())
(gdb) c
Continuing.
Breakpoint 3, Assembly::Create (pDomain=0x6a811f6f3da0 <g_pSharedDomainMemory>, pFile=0x12e8a40, debuggerFlags=DACF_ALLOW_JIT_OPTS, fIsCollectible=0, pamTracker=0x77e22c689c40,
pLoaderAllocator=0x0) at /home/ed/coreclr/src/vm/assembly.cpp:575
575 NewHolder<Assembly> pAssembly (new Assembly(pDomain, pFile, debuggerFlags, fIsCollectible));
(gdb) c
Continuing.
Breakpoint 4, Module::Create (pAssembly=0x131a530, moduleRef=637534208, file=0x12e8a40, pamTracker=0x77e22c689c40) at /home/ed/coreclr/src/vm/ceeload.cpp:1463
1463 if (file->HasNativeImage())
(gdb) c
Continuing.
Breakpoint 5, Module::LoadHelperTable (this=0x6a7cb8846000) at /home/ed/coreclr/src/vm/jitinterface.cpp:12757
12757 BYTE * table = (BYTE *) GetNativeImage()->GetNativeHelperTable(&tableSize);
(gdb) c
Continuing.
Breakpoint 6, LoaderCodeHeap::CreateCodeHeap (pInfo=0x77e22c689700, pJitMetaHeap=0x6a811f6f5498 <g_pSystemDomainMemory+2840>) at /home/ed/coreclr/src/vm/codeman.cpp:2070
2070 size_t reserveSize = pInfo->getReserveSize();
(gdb) n
2076 if (reserveSize != (DWORD) reserveSize)
(gdb)
2071 size_t initialRequestSize = pInfo->getRequestSize();
(gdb)
2072 const BYTE * loAddr = pInfo->m_loAddr;
(gdb)
2073 const BYTE * hiAddr = pInfo->m_hiAddr;
(gdb)
2091 NewHolder<LoaderCodeHeap> pCodeHeap(new LoaderCodeHeap(pPrivatePCLBytes));
(gdb)
2094 DWORD dwSizeAcquiredFromInitialBlock = 0;
(gdb)
2097 pBaseAddr = (BYTE *)pInfo->m_pAllocator->GetCodeHeapInitialBlock(loAddr, hiAddr, (DWORD)initialRequestSize, &dwSizeAcquiredFromInitialBlock);
(gdb)
2098 if (pBaseAddr != NULL)
(gdb)
2104 if (loAddr != NULL || hiAddr != NULL)
(gdb)
2106 pBaseAddr = ClrVirtualAllocWithinRange(loAddr, hiAddr,
(gdb)
2109 if (!pBaseAddr)
(gdb)
2112 pBaseAddr = ExecutionManager::GetEEJitManager()->AllocateFromEmergencyJumpStubReserve(loAddr, hiAddr, &reserveSize);
(gdb)
2114 ThrowOutOfMemoryWithinRange();
(gdb)
Failed to initialize CoreCLR, HRESULT: 0x80004005
[Thread 0x6a8114b11700 (LWP 12569) exited]
[Thread 0x6a7cb9961700 (LWP 12571) exited]
[Thread 0x6a811532b700 (LWP 12568) exited]
[Thread 0x6a8115b54700 (LWP 12567) exited]
[Thread 0x6a8116d97700 (LWP 12566) exited]
[Thread 0x6a8117651700 (LWP 12565) exited]
[Thread 0x6a8117f05700 (LWP 12564) exited]
[Thread 0x6a811cd53700 (LWP 12563) exited]
[Thread 0x6a811d64d700 (LWP 12562) exited]
[Thread 0x6a8120e91740 (LWP 12561) exited]
[Inferior 1 (process 12561) exited with code 0211]
And here is an updated strace.txt now that I use CoreCLR 1.1.1. We can see that just before the exception, there is a succession of mmap and munmap calls for a total of about 2.4 GB of memory, after open("/opt/dotnet/shared/Microsoft.NETCore.App/1.1.1/System.Private.CoreLib.ni.dll", O_RDONLY)
.
When I set ServerGarbageCollection
to false
, everything works fine, the application starts. We can see in strace-ServerGC-false.txt that in this case, this succession of mmap and munmap calls after open("/opt/dotnet/shared/Microsoft.NETCore.App/1.1.1/System.Private.CoreLib.ni.dll", O_RDONLY)
does not happen.
Finally, here is strace-successful.txt, for the situation when ServerGarbageCollection
is NOT set to false
but the application starts successfully. This succession of mmap and munmap calls does not happen.
This situation happens sometimes (I would say ~20% of the launchs), but most of the time (~80%) there is the Failed to initialize CoreCLR, HRESULT: 0x80004005
error. Note: I do not recompile the application between the launchs.
@sebok Thank you so much, this is extremely helpful!
@gkhanna79
So, perhaps the huge amount of maps are coming from here: https://github.com/dotnet/coreclr/blob/master/src/utilcode/util.cpp#L644 , when we are scanning for a free chunk of VM to allocate these emergency jump stubs? I'm not super familiar with the PAL or JIT to know if VirtualQuery
results in calls to mmap, but if we are failing to find a free space of VM I suppose it could explain this behavior.
I can imagine why Server GC would trigger this behavior - we commit an ephemeral segment and LOH segment for every heap that we create. I can't imagine what has to transpire to run out of VM on a 64-bit process, though...
@janvorli could you please take a look?
Based on the strace I am almost sure it is due to GRSEC/PAX enabled - see dotnet/cli#8480. @sebok do you know if you have GRSEC/PAX enabled on your Linux? As a workaround, disabling for dotnet using paxctl -r dotnet
where dotnet should be specified by its full path should work.
The issue is that the address passed to mmap is ignored with GRSEC/PAX enabled and that prevents us to find a block of free virtual memory within a specific range.
CC @kouvel
@janvorli Indeed, I assume GRSEC/PAX is enabled:
bzImage-3.14.32-xxxx-grs-ipv6-64
(it's actually the default kernel image for all servers hosted by OVH)dmesg
zgrep CONFIG_SECURITY /proc/config.gz
returns CONFIG_SECURITYFS=y
I'll take a look at the workaround mentionned in https://github.com/dotnet/coreclr/issues/8480 and keep you updated.
So, paxctl -r /opt/dotnet/dotnet
returns file /opt/dotnet/dotnet does not have a PT_PAX_FLAGS program header, try conversion
. Same result on libcoreclr.so
.
And I don't have any file named corerun
or coreconsole
.
Am I missing something?
@sebok I've not tried to use the paxctl myself before. Looking at its options, there is an -C
option that should do the conversion the -r option complains about. Can you please give it a try? I am not sure if you can pass both options at once or if you need to convert first and then run the -r.
By the way, I confirm that this kernel is configured with PAX and RANDMMAP enabled:
# zgrep -i pax /proc/config.gz
CONFIG_PAX_USERCOPY_SLABS=y
# PaX
CONFIG_PAX=y
# PaX Control
CONFIG_PAX_SOFTMODE=y
CONFIG_PAX_EI_PAX=y
CONFIG_PAX_PT_PAX_FLAGS=y
CONFIG_PAX_XATTR_PAX_FLAGS=y
# CONFIG_PAX_NO_ACL_FLAGS is not set
CONFIG_PAX_HAVE_ACL_FLAGS=y
# CONFIG_PAX_HOOK_ACL_FLAGS is not set
# CONFIG_PAX_NOEXEC is not set
# CONFIG_PAX_KERNEXEC is not set
CONFIG_PAX_KERNEXEC_PLUGIN_METHOD=""
CONFIG_PAX_ASLR=y
# CONFIG_PAX_RANDKSTACK is not set
CONFIG_PAX_RANDUSTACK=y
CONFIG_PAX_RANDMMAP=y
# CONFIG_PAX_MEMORY_SANITIZE is not set
# CONFIG_PAX_MEMORY_STACKLEAK is not set
CONFIG_PAX_MEMORY_STRUCTLEAK=y
# CONFIG_PAX_MEMORY_UDEREF is not set
# CONFIG_PAX_REFCOUNT is not set
# CONFIG_PAX_USERCOPY is not set
CONFIG_PAX_SIZE_OVERFLOW=y
# CONFIG_PAX_LATENT_ENTROPY is not set
paxctl -c -r /opt/dotnet/dotnet
worked and... SOLVED THE ISSUE! :+1:
It would be great to make it unecessary as you suggest in https://github.com/dotnet/coreclr/issues/8480, or at least / in the meanwhile, provide a more detailed error message to guide users to this workaround.
Alternative: check if PAX/RANDMMAP seems enabled at first launch of dotnet CLI tools (only once, when it populates nuget cache), and warn the user if necessary.
@sebok, thank you for the confirmation!
@janvorli Thank you! 😄
@janvorli, out of curiosity, how does this relate to server garbage collection? The issue never happens when server GC is disabled. And why, when server GC is enabled, does it sometimes happen and other times not?
@sebok The reason why it sometimes works and sometimes it doesn't with the server GC enabled is that it depends on where System.Private.CoreLib.ni.dll and libcoreclr.so are located in the memory. If they end up being placed close enough so that the code in System.Private.CoreLib.ni.dll can make a short call when calling the native runtime, then we don't need to allocate jump stubs within 2GB distance from the jitted code and everything is fine. So it is just a matter of pure luck where the mmap ends up placing the requested memory range.
If you look at the failure case log, the System.Private.CoreLib.ni.dll is mapped at 0x73dffb6e4000, while the libcoreclr.so is mapped at 0x73e46d99b000, which means they are about 17GB from each other. So in the middle of loading the System.Private.CoreLib.ni.dll, you can see that we've started to scan 2GB of memory right below the System.Private.CoreLib.ni.dll from 0x73df7b870000 to 0x73e00a2b0000 in attempt to find a free slot for the jump stubs. But due to the RANDMMAP, mmap behaves as if the memory was occupied (returning a different address than the one requested) and so we don't find a slot and that's our stop station.
If you look at the success log with server gc still on, you can see that System.Private.CoreLib.ni.dll is mapped at 0x72d3a8b93000 and libcoreclr.so is mapped at 0x72d3ae563000, which is just 96MB from each other and so no jump stubs are needed.
And finally, why it "always" works with server GC disabled? Well, I think it could also sometimes fail, but rarely. The reason is that when server GC is disabled, GC reserves much smaller amount of virtual memory and so it is much more likely that the System.Private.CoreLib.ni.dll and libcoreclr.so will be close enough to each other (even if the virtual memory range reserved for GC is in between them).
@janvorli, thank you it's very clear! For others who might ask why are these jump stubs needed, I found this article quite useful:
Steps to reproduce
making default WebAPI project in VS publish to remote Ubuntu server dotnet WebApplication3.dll
Expected behavior
Now listening on: http://*:11901 Application started. Press Ctrl+C to shut down.
Actual behavior
Failed to initialize CoreCLR, HRESULT: 0x80004005
Environment data
dotnet --info
output: Windows: .NET Command Line Tools (1.0.0-preview2-1-003177)Product Information: Version: 1.0.0-preview2-1-003177 Commit SHA-1 hash: a2df9c2576
Runtime Environment: OS Name: Windows OS Version: 10.0.14393 OS Platform: Windows RID: win10-x64
Linux: .NET Command Line Tools (1.0.0-preview2-1-003177)
Product Information: Version: 1.0.0-preview2-1-003177 Commit SHA-1 hash: a2df9c2576
Runtime Environment: OS Name: ubuntu OS Version: 14.04 OS Platform: Linux RID: ubuntu.14.04-x64
Looks there is some environment issue. I succeeded to start web app on local Ubuntu but failed to start on remote. I cannot find any description of an issue. Please help to investigate.