Closed stay2be closed 4 years ago
Do you open always on?
@vincywindy Sure, "Always on" is "on" but it shouldn't has to make a difference after some warm-up time where slow loading parts of the application should respond more quickly.
Similar issue, the IdentityServer4 project sometimes hangs when calling authorize token userinfo
endpoints, it occurs at random places of database calls.
There is one file AbpProfileService
, when putting breakpoint at GetProfileDataAsync()
or IsActiveAsync()
to slow down the execution, the IdentityServer4 can response correctly. These 2 functions are marked as [UnitOfWork]
, maybe this is the reason?
https://github.com/abpframework/abp/blob/654eeede81a733382066bfd00ab19b6b65ceec0e/modules/identityserver/src/Volo.Abp.IdentityServer.Domain/Volo/Abp/IdentityServer/AspNetIdentity/AbpProfileService.cs#L31.
another issue reporting timeout https://github.com/abpframework/abp/issues/2121, maybe ef core (or the UnitOfWork) has some issue on linux platform?
I don't suppose that is something related to ABP framework. However, we will deeply investigate the problem. Thank you for your time that you've written a lot of useful info.
@itryan First I thought it maybe was about the database connection or the setup too, but if you try this endpoint here, the request had never failed during my tests: https://abp-test-app.azurewebsites.net/api/test (For extra load there's a loop in this method calling 100 times the db) -> No stuck, no timeouts.
Maybe it's a special case with the combination of Linux + EF Core + Identity Server in abp that is causing this behavior. I have also tried following the logs inside the Identity Server code (when you run the app as debug) and it's stuck on random places, so I haven't found a pattern yet.
Thanks @hikalkan for looking into this. Let me know if I can help by providing more info to this case.
Hi, we're running into similar issues regarding the auth-server host. Our solution is based on the microservice-demo sample. We want to run all our services on docker, linux containers, and everything works, but the auth-server host is hanging at random. The issues are correlated with amount of ram/cpu assigned to docker.
But given that for a our sample app we have 2 small microservices + gateway + auth-server, hangs on 8GB ram and 4 cpu cores are not normal at all.
When we extract auth-server to run locally on windows, not on docker, the hangs are much less frequent.
We tried dockerizing the sample ID4 sample host and auth against it (without ABPs dlls) and there were no lagginnes at all even at 3GB ram/2CPU cores assigned to docker. But we don't want such solution, as we like the work ABP has done for all the stores etc.
Going through the comments, the pattern arrises that when it is hosted on linux (or linux based container) there is some issue.
Also we did not narrowed down the issue to any special endpoint. Sometimes those are calls for token, user-info, but usually it hangs on introspection validation on random endpoints in other services.
I am also hosting on Linux in combination with Docker and also get random hangs / never responding APIs. Machine has 8 cores and 32 GB RAM with no limitations for Docker. However I do not use IdentityServer so I do not know if it's the same issue here. Just wanted to share this. Might be also an issue on my end.
I get the exact same thing on windows dev env. Endpoints related to identity server are super slow or don't respond at all.
i have the same issue, using in docker+linux
I think this problem from Npgsql Driver which is fixed version 4.1.2 https://github.com/npgsql/npgsql/releases/tag/v4.1.2
@ckansiz I've added a patch to the test repo bumping up the version of Npgsql. It does not make a difference, so I'm not sure if this is related. Still the same problem as described in the first post here.
https://gitlab.com/abp-test/testapp/commit/27b00a10ab07c641998c0e0d5adbb0869253456f
Verified that's indeed the latest version
I think this problem from Npgsql Driver which is fixed version 4.1.2 https://github.com/npgsql/npgsql/releases/tag/v4.1.2
I'm using SqlServer, not PostreSQL so its probably not related at all
I am also experiencing this locally on Manjaro with ABP 1.0.2 with SqlServer. Anything generally hitting the UserManager methods fails, rest of the app is fine. Also fine on ABP 0.19.0
SqlServer: mcr.microsoft.com/mssql/server:2017-latest-ubuntu
ABP v0.20 was the first ASP.NET Core 3.0 supported version. I am trying to investigate it, but hard to repeat and understand the case.
@hikalkan just try running your microservice sample on docker for windows using linux containers. In office every single person had troubles with Auth server on this setup.
@hikalkan I have reproduced this with the following setups (all the same blank project generated from https://abp.io/get-started)
VirtualBox
Azure
It does correlate to the CPU count, but I have no idea why. The more CPUs/Cores are available, the less the behavior occurs, but it does not get solved with more resources as the random timeouts indicate.
Database connecting and hosting either locally or remote seems to make no difference.
The following setups works great. And the VM is surprisingly fast for the one CPU core there.
VirtualBox
My dev machine
Other dev machine
Maybe this is a threadpool exhaustion problem? The AsyncHelper class is very dangerous in this context.
I've got this working on Azure: https://abp-test-2.azurewebsites.net
Currently runs on the App Service for Windows "free" plan and works fine. Uses the same database connection as the Azure App Linux version.
Login with admin/1q2w3E* is also working. So definitely related to Linux.
Please be sure NOT to setup tokens caching in microservice settings. This is the way we mitigated this issues, it still occurs but way less often as number of token introspection calls is reduced. Still I consider this to be a cover up, not solution.
Maybe it is related to the AsyncHelper class, as @Trojaner suggested. AsyncHelper.RunSync is used in some places where Identity plays a role.
And the related issue in the Nito.AsyncEx repo: #https://github.com/StephenCleary/AsyncEx/issues/107
The application is hanging for me too. Our setup is :
s1-2
package provided by OVH), utf8mb4_unicode_ci
) - this is another duplicate s1-2
VMhttp://localhost:5000
www-data
After the application starts up I can make requests to unauthenticated endpoints like /.well-known/openid-configuration
and they respond immediately, but any requests /api/abp/application-configuration
stop the application responding. Even the openid endpoint stops responding but I can still see the requests being acknowledged in the logs.
I have made curl requests behind the proxy and I get the same results so that's not the issue.
Application logs don't have any output after the first acknowledgement of the request.
In our case, we discovered is that the requests hang when the database is on another VM/server but requests are fine when using the database on localhost. We have tried using host names instead of IP's for the database connection string which didn't change anything. We also deployed the basic ABP image (changing SqlServer to MySql) and saw exactly the same behaviour. Finally, we recreated the the whole configuration on another set of servers with another provider and got the same results.
MyProjectHttpApiHostModule
contains the following:
public override void ConfigureServices(ServiceConfigurationContext context)
{
context.Services.AddCors(options =>
{
options.AddPolicy(DefaultCorsPolicyName, builder =>
{
builder
.WithOrigins(origins)
.SetIsOriginAllowedToAllowWildcardSubdomains()
.AllowAnyHeader()
.AllowAnyMethod()
.AllowCredentials()
.WithAbpExposedHeaders()
.WithExposedHeaders("Content-Disposition");
});
});
context.Services.Configure<ForwardedHeadersOptions>(options =>
{
options.ForwardedHeaders =
ForwardedHeaders.XForwardedFor | ForwardedHeaders.XForwardedProto;
knownProxies.ForEach(x => { options.KnownProxies.Add(x); });
});
}
public override void OnApplicationInitialization(ApplicationInitializationContext context)
{
app.UseForwardedHeaders();
}
NGINX proxy settings are as follows:
proxy_set_header Host $host;
proxy_set_header Referer $http_referer;
proxy_set_header X-Real-IP $remote_addr;
real_ip_recursive on;
real_ip_header X-Forwarded-For;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
proxy_set_header X-Forwarded-Server $host;
proxy_pass_header Server;
proxy_pass_request_headers on;
I have done more testing. My tests are not controlled so they only give an indication of what changes with different configurations but it reveals some interesting results.
My tests were on a Debian 10.2x64 Hyper-V VM with 16GB vhdx and dotnet-runtime-3.1.
I made 100 requests over ~30 seconds to the /.well-known/openid-configuration
endpoint on a Chrome web browser. I counted how many pending/cancelled requests were recorded during that time.
I used abpframework v1.0.2 (Angular ui version) with MySQL and published as framework-dependant (portable) and pointed it at another Hyper-V VM with MariaDB running on it using the IP address of the MariaDB box and root user for the connection string. I also tried running 2 tests with MariaDB running on the same VM as the application which reduced the errors.
Working on this problem for a few days and finally it seems related to EF Core or SqlClient:
Still trying to understand.
Status update:
I am able to repeat a similar problem in local docker (with linux containers) when I decrease RAM and CPU resources:
First of all, problem is completely unrelated to AsyncHelper.RunSync
calls. To proove it, I added log into it to see if it is called. There was a few cases it is called and I also eleminated them, not it is not being called at all. So, no async over sync problem.
18:31:30 [DBG] [1579] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] Creating DbCommand for 'ExecuteReader'.
18:31:30 [DBG] [1580] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] Created DbCommand for 'ExecuteReader' (1ms).
18:31:30 [DBG] [1581] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] Executing DbCommand [Parameters=[@__ef_filter__p_0='?' (DbType = Boolean), @__normalizedRoleName_0='?' (Size = 256), @__ef_filter__p_2='?' (DbType = Boolean)], CommandType='"Text"', CommandTimeout='30']
SELECT [t].[Id], [t].[ConcurrencyStamp], [t].[ExtraProperties], [t].[IsDefault], [t].[IsPublic], [t].[IsStatic], [t].[Name], [t].[NormalizedName], [t].[TenantId], [t0].[Id], [t0].[ClaimType], [t0].[ClaimValue], [t0].[RoleId], [t0].[TenantId]
FROM (
SELECT TOP(1) [a].[Id], [a].[ConcurrencyStamp], [a].[ExtraProperties], [a].[IsDefault], [a].[IsPublic], [a].[IsStatic], [a].[Name], [a].[NormalizedName], [a].[TenantId]
FROM [AbpRoles] AS [a]
WHERE ((@__ef_filter__p_0 = CAST(1 AS bit)) OR [a].[TenantId] IS NULL) AND ([a].[NormalizedName] = @__normalizedRoleName_0)
) AS [t]
LEFT JOIN (
SELECT [a0].[Id], [a0].[ClaimType], [a0].[ClaimValue], [a0].[RoleId], [a0].[TenantId]
FROM [AbpRoleClaims] AS [a0]
WHERE (@__ef_filter__p_2 = CAST(1 AS bit)) OR [a0].[TenantId] IS NULL
) AS [t0] ON [t].[Id] = [t0].[RoleId]
ORDER BY [t].[Id], [t0].[Id]
18:31:30 [INF] [1582] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] Executed DbCommand (6ms) [Parameters=[@__ef_filter__p_0='?' (DbType = Boolean), @__normalizedRoleName_0='?' (Size = 256), @__ef_filter__p_2='?' (DbType = Boolean)], CommandType='"Text"', CommandTimeout='30']
SELECT [t].[Id], [t].[ConcurrencyStamp], [t].[ExtraProperties], [t].[IsDefault], [t].[IsPublic], [t].[IsStatic], [t].[Name], [t].[NormalizedName], [t].[TenantId], [t0].[Id], [t0].[ClaimType], [t0].[ClaimValue], [t0].[RoleId], [t0].[TenantId]
FROM (
SELECT TOP(1) [a].[Id], [a].[ConcurrencyStamp], [a].[ExtraProperties], [a].[IsDefault], [a].[IsPublic], [a].[IsStatic], [a].[Name], [a].[NormalizedName], [a].[TenantId]
FROM [AbpRoles] AS [a]
WHERE ((@__ef_filter__p_0 = CAST(1 AS bit)) OR [a].[TenantId] IS NULL) AND ([a].[NormalizedName] = @__normalizedRoleName_0)
) AS [t]
LEFT JOIN (
SELECT [a0].[Id], [a0].[ClaimType], [a0].[ClaimValue], [a0].[RoleId], [a0].[TenantId]
FROM [AbpRoleClaims] AS [a0]
WHERE (@__ef_filter__p_2 = CAST(1 AS bit)) OR [a0].[TenantId] IS NULL
) AS [t0] ON [t].[Id] = [t0].[RoleId]
ORDER BY [t].[Id], [t0].[Id]
18:31:30 [DBG] [1583] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] Context 'IdentityDbContext' started tracking 'IdentityRole' entity. Consider using 'DbContextOptionsBuilder.EnableSensitiveDataLogging' to see key values.
18:31:30 [DBG] [1584] [w:,c:,t:4] [ea3ca8e38dac4d128648e94021ce87eb] A data reader was disposed.
When I try to login, it hangs and the last log lines are shown above. I opened Verbose (most detailed) logging for EF and ASP.NET Core.
I can not repeat it on Windows or Visual Studio.
Still working on it...
One thing I noticed with my tests was that the dotnet process spawned more threads with more CPU cores. Reducing memory didn't cause the requests to hang more, but reducing CPU cores did. Could it be related to thread pool starvation or garbage collection issues? It's also worth mentioning that garbage collection works differently with a single CPU core and when running on a server versus a workstation: https://devblogs.microsoft.com/premier-developer/understanding-different-gc-modes-with-concurrency-visualizer/
Maybe this helps you? Thanks for your investigation, it is greatly appreciated!
Probably found the problem. It seems related to intercepting logic of async methods. I am working on it. However, the solution may not be so easy. I am also considering to completely remove AsyncHelper usages. To be honest I am considering;
Otherwise, eventualy it causes async over sync problem when you have a sync method which is intercepted and the interceptor requires to call an async method.
If we go like that, the next version may be 2.0 since it is a huge breaking change :)
After the changes above and using the https://github.com/JSkimming/Castle.Core.AsyncInterceptor all the problems have solved for the https://abp.io/ platform and startup template. Tested on kubernetes and local docker.
So, the next version will be 2.0 because of semantic versioning.
@hikalkan this solution fix #2075 ?
If I understood it correctly, 1.2 will become 2.0?
OMG, it is a huge breaking change, so is there a roadmap for 2.0?
is there a roadmap for 2.0?
This milestone (https://github.com/abpframework/abp/milestone/26) becomes v2.0 :) So, it will be an early 2.0 because of this important breaking change (while it will not require a change in most applications).
Thanks for publishing v2.0 of abpframework. After testing the release I can confirm that this problem seams to be resolved. No timeouts and hangs in Linux test VMs and on an Azure App Service Linux instance. Now we can switch back to Linux on Azure. 🚀
@stay2be thank you for your feedback. I am VERY happy to see that. This issue has eaten 1 week of development time to solve 😃
Thank you very much, I can also confirm that all performance issues are gone now. Maybe this took you 1 week but you likely saved months of work for others.
Hi folks,
first let me thank you for this great project. I've learned a lot while working with this framework and the design principles behind it.
I've run into the following problem when deploying a project based on abp to an Azure App Service Linux instance. I'm not sure if this is related to abp or maybe to Identity Server 4.
About the environment:
Findings:
grant_type:password&client_id:TestApp_App&client_secret:1q2w3e&username:admin&password:1q2w3E&scope:TestApp
// First request 2019-10-28 09:52:14.338 +00:00 [INF] Request starting HTTP/1.1 POST http://abp-test-app.azurewebsites.net/connect/token application/x-www-form-urlencoded 113 2019-10-28 09:52:14.663 +00:00 [INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token 2019-10-28 09:52:17.079 +00:00 [INF] {"ClientId":"TestApp_App","AuthenticationMethod":"SharedSecret","Category":"Authentication","Name":"Client Authentication Success","EventType":"Success","Id":1010,"Message":null,"ActivityId":"0HLQRHH7GJOOJ:00000001","TimeStamp":"2019-10-28T09:52:17.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"ClientAuthenticationSuccessEvent"} 2019-10-28 09:52:18.652 +00:00 [INF] Credentials validated for username: admin 2019-10-28 09:52:18.654 +00:00 [INF] {"Username":"admin","Provider":null,"ProviderUserId":null,"SubjectId":"c47b2d11-5bb2-321f-bb51-39f123c3b05e","DisplayName":"admin","Endpoint":"Token","ClientId":null,"Category":"Authentication","Name":"User Login Success","EventType":"Success","Id":1000,"Message":null,"ActivityId":"0HLQRHH7GJOOJ:00000001","TimeStamp":"2019-10-28T09:52:18.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"UserLoginSuccessEvent"} 2019-10-28 09:52:18.948 +00:00 [INF] {"Username":"admin","Provider":null,"ProviderUserId":null,"SubjectId":"c47b2d11-5bb2-321f-bb51-39f123c3b05e","DisplayName":null,"Endpoint":"Token","ClientId":"TestApp_App","Category":"Authentication","Name":"User Login Success","EventType":"Success","Id":1000,"Message":null,"ActivityId":"0HLQRHH7GJOOJ:00000001","TimeStamp":"2019-10-28T09:52:18.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"UserLoginSuccessEvent"} 2019-10-28 09:52:18.964 +00:00 [INF] Token request validation success, {"ClientId":"TestApp_App","ClientName":"TestApp_App","GrantType":"password","Scopes":"TestApp","AuthorizationCode":null,"RefreshToken":null,"UserName":"admin","AuthenticationContextReferenceClasses":null,"Tenant":null,"IdP":null,"Raw":{"grant_type":"password","client_id":"TestApp_App","client_secret":"REDACTED","username":"admin","password":"REDACTED","scope":"TestApp"},"$type":"TokenRequestValidationLog"} 2019-10-28 09:52:19.547 +00:00 [INF] {"ClientId":"TestApp_App","ClientName":"TestApp_App","RedirectUri":null,"Endpoint":"Token","SubjectId":"c47b2d11-5bb2-321f-bb51-39f123c3b05e","Scopes":"TestApp","GrantType":"password","Tokens":[{"TokenType":"access_token","TokenValue":"****ZHeg","$type":"Token"}],"Category":"Token","Name":"Token Issued Success","EventType":"Success","Id":2000,"Message":null,"ActivityId":"0HLQRHH7GJOOJ:00000001","TimeStamp":"2019-10-28T09:52:19.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"TokenIssuedSuccessEvent"} 2019-10-28 09:52:19.606 +00:00 [INF] Request finished in 5277.2916000000005ms 200 application/json; charset=UTF-8
// Second request 2019-10-28 09:52:26.234 +00:00 [INF] Request starting HTTP/1.1 POST http://abp-test-app.azurewebsites.net/connect/token application/x-www-form-urlencoded 113 2019-10-28 09:52:26.268 +00:00 [INF] Invoking IdentityServer endpoint: IdentityServer4.Endpoints.TokenEndpoint for /connect/token 2019-10-28 09:52:26.415 +00:00 [INF] {"ClientId":"TestApp_App","AuthenticationMethod":"SharedSecret","Category":"Authentication","Name":"Client Authentication Success","EventType":"Success","Id":1010,"Message":null,"ActivityId":"0HLQRHH7GJOOK:00000001","TimeStamp":"2019-10-28T09:52:26.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"ClientAuthenticationSuccessEvent"} 2019-10-28 09:52:26.745 +00:00 [INF] Credentials validated for username: admin 2019-10-28 09:52:26.752 +00:00 [INF] {"Username":"admin","Provider":null,"ProviderUserId":null,"SubjectId":"c47b2d11-5bb2-321f-bb51-39f123c3b05e","DisplayName":"admin","Endpoint":"Token","ClientId":null,"Category":"Authentication","Name":"User Login Success","EventType":"Success","Id":1000,"Message":null,"ActivityId":"0HLQRHH7GJOOK:00000001","TimeStamp":"2019-10-28T09:52:26.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"UserLoginSuccessEvent"} 2019-10-28 09:52:26.801 +00:00 [INF] {"Username":"admin","Provider":null,"ProviderUserId":null,"SubjectId":"c47b2d11-5bb2-321f-bb51-39f123c3b05e","DisplayName":null,"Endpoint":"Token","ClientId":"TestApp_App","Category":"Authentication","Name":"User Login Success","EventType":"Success","Id":1000,"Message":null,"ActivityId":"0HLQRHH7GJOOK:00000001","TimeStamp":"2019-10-28T09:52:26.0000000Z","ProcessId":27,"LocalIpAddress":"::ffff:172.16.0.2:8080","RemoteIpAddress":"90.187.63.221","$type":"UserLoginSuccessEvent"} 2019-10-28 09:52:26.802 +00:00 [INF] Token request validation success, {"ClientId":"TestApp_App","ClientName":"TestApp_App","GrantType":"password","Scopes":"TestApp","AuthorizationCode":null,"RefreshToken":null,"UserName":"admin","AuthenticationContextReferenceClasses":null,"Tenant":null,"IdP":null,"Raw":{"grant_type":"password","client_id":"TestApp_App","client_secret":"REDACTED","username":"admin","password":"REDACTED","scope":"TestApp"},"$type":"TokenRequestValidationLog"}
// Server error after waiting 4 minutes (seems to be a timeout)
Server Error.
There was an unexpected error in the request processing.It could be related to the Identity Server store implementation, because after removing the include of database related entries in the ID4 discovery endpoint response document (.well-known/openid-configuration), the endpoint doesn't hang anymore.
Testing other parts of the infrastructure
The problem here is not having a cheap instance in production but a low configured instance for staging and development. It's no problem when the application take time respond on auth and some endpoints while development, but the app is not responding after all and seems to be stuck on the communication with the Identity Server 4 hosted instance. All other endpoints are responding fast, even the ones calling the database. The question is, why a higher CPU count/ACU solves the problem.