ThreeMammals / Ocelot

.NET API Gateway
https://www.nuget.org/packages/Ocelot
MIT License
8.4k stars 1.64k forks source link

In v23.3.4 `ConsulProviderFactory` can't resolve `IConsulServiceBuilder` via `IServiceProvider` #2178

Closed Niksson closed 1 month ago

Niksson commented 1 month ago

Expected Behavior / New Feature

Service Discovery via Consul works with extended DefaultConsulServiceBuilder via AddConsul<T>()

Actual Behavior / Motivation for New Feature

When trying to send a request via Ocelot ConsulProviderFactory encounters an error:

2024-10-17 09:03:45 [07:03:45 WRN] requestId: XXX, previousRequestId: No PreviousRequestId, message: 'Error Code: UnableToFindLoadBalancerError Message: Unable to find load balancer for 'GET|/some/random/path|no-host|no-host-and-port|no-svc-ns|mailbox|LeastConnection|no-lb-key'. Exception: System.InvalidOperationException: Cannot resolve scoped service 'Ocelot.Provider.Consul.Interfaces.IConsulServiceBuilder' from root provider.
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier serviceIdentifier, ServiceProviderEngineScope serviceProviderEngineScope)
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)
2024-10-17 09:03:45    at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetService[T](IServiceProvider provider)
2024-10-17 09:03:45    at Ocelot.Provider.Consul.ConsulProviderFactory.CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.ServiceDiscovery.ServiceDiscoveryProviderFactory.GetServiceDiscoveryProvider(ServiceProviderConfiguration config, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.ServiceDiscovery.ServiceDiscoveryProviderFactory.Get(ServiceProviderConfiguration serviceConfig, DownstreamRoute route)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerFactory.Get(DownstreamRoute route, ServiceProviderConfiguration config)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerHouse.GetResponse(DownstreamRoute route, ServiceProviderConfiguration config)
2024-10-17 09:03:45    at Ocelot.LoadBalancer.LoadBalancers.LoadBalancerHouse.Get(DownstreamRoute route, ServiceProviderConfiguration config); errors found in ResponderMiddleware. Setting error response for request path:/some/random/path, request method: GET'

Steps to Reproduce the Problem

  1. Add Consul with a customized IConsulServiceBuilder:

    services.AddOcelot()
    .AddConsul<ServiceAddressConsulServiceBuilder>()

    The code for this builder is taken from documentation example on how to use service address as host:

    public class ServiceAddressConsulServiceBuilder : DefaultConsulServiceBuilder
    {
    public ServiceAddressConsulServiceBuilder(IHttpContextAccessor contextAccessor, IConsulClientFactory clientFactory, IOcelotLoggerFactory loggerFactory) : base(contextAccessor, clientFactory, loggerFactory)
    {
    }
    
    protected override string GetDownstreamHost(ServiceEntry entry, Node node)
    {
        return entry.Service.Address;
    }
    }
  2. Configure Ocelot and a route to use Service Discovery via Consul
  3. Try to hit this route

Specifications

Workarounds:

I was able to work around this issue by reregistering the IConsulServiceBuilder as transient:

services.AddOcelot()
    .AddConsul();

services.RemoveAll<IConsulServiceBuilder>();
services.AddTransient<IConsulServiceBuilder, ServiceAddressConsulServiceBuilder>();
Niksson commented 1 month ago

Hi!

Just a little clarification - I encountered this error in a production project, so I'm not sure if the steps to reproduce accurately represent how to reproduce the problem. I will try and find time to create a sample project to reproduce this error. Hope I can do it soon.

raman-m commented 1 month ago

Hello!

Workarounds:

I was able to work around this issue by reregistering the IConsulServiceBuilder as transient:

services.AddOcelot().AddConsul();
services.RemoveAll<IConsulServiceBuilder>();
services.AddTransient<IConsulServiceBuilder, ServiceAddressConsulServiceBuilder>();

It appears you have a customized solution. Your issue is that adding a default scoped service is not suitable for your environment, as seen here: (https://github.com/ThreeMammals/Ocelot/blob/ce0227c059f220761acc2d880e554174bdb3c544/src/Ocelot.Provider.Consul/OcelotBuilderExtensions.cs#L28) Scoped context refers to the context of the current request. Transient injection is not very practical since service injection occurs in just one class. Therefore, we request you to upload the complete solution as it is customized. Kindly upload the Ocelot application to your account for a comprehensive review by our team❗

P.S.

It appears to be a concurrency issue, as the domain services for this feature are singletons, including the factory class... but I'm not sure.

Niksson commented 1 month ago

I cannot upload the solution as is, since it's a private one. As I said, I will try and create a sample project to see if the error reproduces.

raman-m commented 1 month ago

@Niksson commented on Oct 17

Just a little clarification - I encountered this error in a production project, so I'm not sure if the steps to reproduce accurately represent how to reproduce the problem.

That's unfortunate. How frequently does this error occur? Is it happening with every request or just sporadically?

Additionally, based on your description, the JSON of the app is not attached. Could you please provide the contents of your ocelot.json file?

Niksson commented 1 month ago

While I'm at it, I will note this, from Ocelot's source code:

public class ServiceDiscoveryProviderFactory : IServiceDiscoveryProviderFactory // This is registered as Singleton
    {
        public ServiceDiscoveryProviderFactory(IOcelotLoggerFactory factory, IServiceProvider provider)
        {
            // We inject an IServiceProvider here
            _provider = provider;
            _delegates = provider.GetService<ServiceDiscoveryFinderDelegate>();
           //...
        }

And in the method passed as a delegate:

// Also registered as Singleton
private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
    {
        // ...

       // IConsulServiceBuilder is Scoped, and Scoped services usually cannot be resolved from Singletons
        var serviceBuilder = provider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

In singleton services you usually need to create a scope to resolve scoped services, for example:

private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
    {
        // ...

        var scope = provider.CreateScope();
        var serviceBuilder = scope.ServiceProvider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

I've found a related article - https://samwalpole.com/blog/using-scoped-services-inside-singletons/index.html

Just to be clear - I'm still working on providing a sample project to help you with investigation

Niksson commented 1 month ago

@raman-m The error occurs on every request. I need to strip the ocelot.json of private info before sending it to you, but I will try to do it with the sample project

raman-m commented 1 month ago

Please recognize that this is an open-source project. Stating "I cannot share code or files" and not revealing all parts of the application implies that we are unable to assist you.

Just to be clear - I'm still working on providing a sample project to help you with investigation

Understood... best of luck! We have other significant tasks to prioritize.

raman-m commented 1 month ago

In singleton services you usually need to create a scope to resolve scoped services, for example:

private static IServiceDiscoveryProvider CreateProvider(IServiceProvider provider, ServiceProviderConfiguration config, DownstreamRoute route)
        // ...
        var scope = provider.CreateScope();
        var serviceBuilder = scope.ServiceProvider.GetService<IConsulServiceBuilder>(); // consume data in default/custom builder

I've found a related article - https://samwalpole.com/blog/using-scoped-services-inside-singletons/index.html

Okay, thanks! That could be the root cause. But why is it impossible to reproduce by our tests? Other developers have informed that the 23.3.4 fix is functioning, which is peculiar. It appears the issue arises in specific types of solutions with numerous customizations.

Have you implemented this fix in your application? Is it functioning properly?


Just to be clear - I'm still working on providing a sample project to help you with investigation

If your production environment is unstable with version 23.3.4, it is advisable to revert to version 23.2.2 immediately.

Niksson commented 1 month ago

Hi again @raman-m! I've created a sample project that includes an Ocelot configuration - https://github.com/Niksson/OcelotSampleProject. I've transfered most of the customizations we made just for the sake of the example, and I could reproduce the error.

This solution uses docker-compose (mostly for running Consul locally). I would advise using it if possible. In the projects you can also find .http files to make requests to the sample API and gateway.

Feel free to ask questions if there are any. Also, there may be errors in ocelot configuration files (specifically ocelot.global.json - I left them as they are in the production code.

Niksson commented 1 month ago

Have you implemented this fix in your application? Is it functioning properly?

No, not yet, I only worked it around using Transient lifetime. I will try referencing the Ocelot source locally and see if I can fix the error by creating the DI scope

Niksson commented 1 month ago

If your production environment is unstable with version 23.3.4, it is advisable to revert to version 23.2.2 immediately.

Yes, we've already done that to be able to keep working

Niksson commented 1 month ago

I will try referencing the Ocelot source locally and see if I can fix the error by creating the DI scope

So I've tried two things, both make the error go away:

Niksson commented 1 month ago

I had a look at tests and I found out why they don't detect the error.

Apparently, when you create a web app using WebHost.CreateDefaultBuilder(string[] args) or WebApplication.CreateBuilder(string[] args) then the scope validation is turned on in Development mode (see this article). If you create a webhost or service collection in any other way, then the scope validation isn't enabled at all

Let's look at the unit test class for example:

public class ProviderFactoryTests
{
    private readonly IServiceProvider _provider;

    public ProviderFactoryTests()
    {
        // Setup code
       services.AddSingleton(contextAccessor.Object);
       services.AddSingleton(consulFactory.Object);
       services.AddSingleton(loggerFactory.Object);
       // We also register IConsuleServiceBuilder, since it's not registered and GetService<T> will just return null otherwise
       var consulServiceBuilder = new Mock<IConsulServiceBuilder>();       
       services.AddScoped(_ => consulServiceBuilder.Object);

       _provider = services.BuildServiceProvider();
    }
...

This still doesn't break the test, because by default scope validation is disabled. Or in the acceptance test:

    public void GivenOcelotIsRunningWithServices(Action<IServiceCollection> configureServices, Action<IApplicationBuilder> configureApp)
    {
        _webHostBuilder = new WebHostBuilder()
            .ConfigureAppConfiguration(WithBasicConfiguration)
            .ConfigureServices(configureServices ?? WithAddOcelot)
            .Configure(configureApp ?? WithUseOcelot);
        _ocelotServer = new TestServer(_webHostBuilder);
        _ocelotClient = _ocelotServer.CreateClient();
    }

The scope validation here is also not enabled by default.


This is something I didn't know about ASP.NET defaults, so TIL!

raman-m commented 1 month ago

You've conducted thorough research, but what is your plan for resolving this? Will you open a PR to address the issues in both ConsulProviderFactory and the testing helpers? It appears we've pinpointed the root causes, so now we can proceed with creating a PR. I assure you that a hotfix PR will be opened, and I will release patch for v23.3.4 as soon as possible.

P.S. Release branch for all 23.3.0+ versions → release/23.3

kick2nick commented 1 month ago

Hi @raman-m, There is the same issue with k8s provider. (Invalid scopes with error appearing only in dev mode) But there KubeApiClient has scoped lifetime. Acceptance test is passing because scope validation turned off and KubeClient replaced with singleton. Found related issue without actions. As a workaround IKubeApiClient can be registered as singleton in first place(anyway it's trapped inside singleton) Probably this should be a separate issue 🤔

Niksson commented 1 month ago

Hi everyone!

Probably this should be a separate issue 🤔

I agree that refactoring should be another issue, but for quick fixing we can do one of these 2 options:

@kick2nick what are your thoughts?

I can definitely create a PR for Consul provider, since I can easily test it now that I have a sample, but I'm not so sure about other providers.

raman-m commented 1 month ago

@kick2nick commented on Oct 20 Thank you for your research on the code! Indeed, the DI issue has been persistent for years, yet it remains unaddressed. The logic involved is quite complex, and a contributor would need an in-depth understanding of the Ocelot codebase to propose a fix.

(Invalid scopes with error appearing only in dev mode)

Typically, we overlook development mode, but your research has revealed a design flaw. Our team has been aware of this issue for over a year, yet we lack the capacity to address everything. The inconsistencies in tests must be resolved. Ocelot should operate correctly in both Development and Release modes.

Probably this should be a separate issue 🤔

Yes, it should! Please transition to discussion #977, which has been reopened today. You may follow this issue, and any upcoming PR should be based on the code from this issue's PR. So, both PRs must be synced somehow....

raman-m commented 1 month ago

@Niksson commented on Oct 21

OK, man...

@kick2nick what are your thoughts?

In fact, you should seek my opinion.

I can definitely create a PR for Consul provider, since I can easily test it now that I have a sample

Alright, proceed! If it takes one day of development, that's fine; go ahead and open a PR. However, if it's going to take up to a week, I can't wait that long. Keep in mind that I'm focused on two issues:

but I'm not so sure about other providers.

Please disregard other discovery providers and concentrate solely on Consul and PollConsul. Thank you!

Niksson commented 1 month ago

@raman-m got it, thanks for the input! I will try to create a PR today.

raman-m commented 1 month ago

Useful Links