aCoder2013 / blog

个人博客,记录个人总结(见issues)
https://acoder2013.typlog.io
GNU General Public License v3.0
421 stars 39 forks source link

Spring Cloud Ribbon踩坑记录及原理解析 #29

Open aCoder2013 opened 6 years ago

aCoder2013 commented 6 years ago

声明:代码不是我写的=_=

现象

前两天碰到一个ribbon相关的问题,觉得值得简单记录一下。表象是对外的接口返回内部异常,这个是封装的统一错误信息,Spring的异常处理器catch到未捕获异常统一返回的信息。因此到日志平台查看实际的异常:

org.springframework.web.client.HttpClientErrorException: 404 null

这里介绍一下背景,出现问题的开放网关,做点事情说白了就是转发对应的请求给后端的服务。这里用到了ribbon去做服务负载均衡、eureka负责服务发现。 这里出现404,首先看了下请求的url以及对应的参数,都没有发现问题,对应的后端服务也没有收到请求。这就比较诡异了,开始怀疑是ribbon或者Eureka的缓存导致请求到了错误的ip或端口,但由于日志中打印的是Eureka的serviceId而不是实际的ip:port,因此先加了个日志:

@Slf4j
public class CustomHttpRequestInterceptor implements ClientHttpRequestInterceptor {

    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {
        log.info("Request , url:{},method:{}.", request.getURI(), request.getMethod());
        return execution.execute(request, body);
    }
}

这里是通过给RestTemplate添加拦截器的方式,但要注意,ribbon也是通过给RestTemplate添加拦截器实现的解析serviceId到实际的ip:port,因此需要注意下优先级添加到ribbon的LoadBalancerInterceptor之后,我这里是通过Spring的初始化完成事件的回调中添加的,另外也添加了另一条日志,在catch到这个异常的时候,利用Eureka的DiscoveryClient#getInstances获取到当前的实例信息。 之后在测试环境中复现了这个问题,看了下日志,eurek中缓存的实例信息是对的,但是实际调用的确实另外一个服务的地址,从而导致了接口404。

源码解析

从上述的信息中可以知道,问题出在ribbon中,具体的原因后面会说,这里先讲一下Spring Cloud Ribbon的初始化流程。

@Configuration
@ConditionalOnClass({ IClient.class, RestTemplate.class, AsyncRestTemplate.class, Ribbon.class})
@RibbonClients
@AutoConfigureAfter(name = "org.springframework.cloud.netflix.eureka.EurekaClientAutoConfiguration")
@AutoConfigureBefore({LoadBalancerAutoConfiguration.class, AsyncLoadBalancerAutoConfiguration.class})
@EnableConfigurationProperties({RibbonEagerLoadProperties.class, ServerIntrospectorProperties.class})
public class RibbonAutoConfiguration {
}

注意这个注解@RibbonClients, 如果想要覆盖Spring Cloud提供的默认Ribbon配置就可以使用这个注解,最终的解析类是:

public class RibbonClientConfigurationRegistrar implements ImportBeanDefinitionRegistrar {

    @Override
    public void registerBeanDefinitions(AnnotationMetadata metadata,
            BeanDefinitionRegistry registry) {
        Map<String, Object> attrs = metadata.getAnnotationAttributes(
                RibbonClients.class.getName(), true);
        if (attrs != null && attrs.containsKey("value")) {
            AnnotationAttributes[] clients = (AnnotationAttributes[]) attrs.get("value");
            for (AnnotationAttributes client : clients) {
                registerClientConfiguration(registry, getClientName(client),
                        client.get("configuration"));
            }
        }
        if (attrs != null && attrs.containsKey("defaultConfiguration")) {
            String name;
            if (metadata.hasEnclosingClass()) {
                name = "default." + metadata.getEnclosingClassName();
            } else {
                name = "default." + metadata.getClassName();
            }
            registerClientConfiguration(registry, name,
                    attrs.get("defaultConfiguration"));
        }
        Map<String, Object> client = metadata.getAnnotationAttributes(
                RibbonClient.class.getName(), true);
        String name = getClientName(client);
        if (name != null) {
            registerClientConfiguration(registry, name, client.get("configuration"));
        }
    }

    private String getClientName(Map<String, Object> client) {
        if (client == null) {
            return null;
        }
        String value = (String) client.get("value");
        if (!StringUtils.hasText(value)) {
            value = (String) client.get("name");
        }
        if (StringUtils.hasText(value)) {
            return value;
        }
        throw new IllegalStateException(
                "Either 'name' or 'value' must be provided in @RibbonClient");
    }

    private void registerClientConfiguration(BeanDefinitionRegistry registry,
            Object name, Object configuration) {
        BeanDefinitionBuilder builder = BeanDefinitionBuilder
                .genericBeanDefinition(RibbonClientSpecification.class);
        builder.addConstructorArgValue(name);
        builder.addConstructorArgValue(configuration);
        registry.registerBeanDefinition(name + ".RibbonClientSpecification",
                builder.getBeanDefinition());
    }
}

atrrs包含defaultConfiguration,因此会注册RibbonClientSpecification类型的bean,注意名称以default.开头,类型是RibbonAutoConfiguration,注意上面说的RibbonAutoConfiguration被@RibbonClients修饰。 然后再回到上面的源码:

public class RibbonAutoConfiguration {

        //上文中会解析被@RibbonClients注解修饰的类,然后注册类型为RibbonClientSpecification的bean。
        //主要有两个: RibbonAutoConfiguration、RibbonEurekaAutoConfiguration
    @Autowired(required = false)
    private List<RibbonClientSpecification> configurations = new ArrayList<>();

    @Bean
    public SpringClientFactory springClientFactory() {
                //初始化SpringClientFactory,并将上面的配置注入进去,这段很重要。
        SpringClientFactory factory = new SpringClientFactory();
        factory.setConfigurations(this.configurations);
        return factory;
    }
         //其他的都是提供一些默认的bean配置

    @Bean
    @ConditionalOnMissingBean(LoadBalancerClient.class)
    public LoadBalancerClient loadBalancerClient() {
        return new RibbonLoadBalancerClient(springClientFactory());
    }

    @Bean
    @ConditionalOnClass(name = "org.springframework.retry.support.RetryTemplate")
    @ConditionalOnMissingBean
    public LoadBalancedRetryPolicyFactory loadBalancedRetryPolicyFactory(SpringClientFactory clientFactory) {
        return new RibbonLoadBalancedRetryPolicyFactory(clientFactory);
    }

    @Bean
    @ConditionalOnMissingClass(value = "org.springframework.retry.support.RetryTemplate")
    @ConditionalOnMissingBean
    public LoadBalancedRetryPolicyFactory neverRetryPolicyFactory() {
        return new LoadBalancedRetryPolicyFactory.NeverRetryFactory();
    }

    @Bean
    @ConditionalOnClass(name = "org.springframework.retry.support.RetryTemplate")
    @ConditionalOnMissingBean
    public LoadBalancedBackOffPolicyFactory loadBalancedBackoffPolicyFactory() {
        return new LoadBalancedBackOffPolicyFactory.NoBackOffPolicyFactory();
    }

    @Bean
    @ConditionalOnClass(name = "org.springframework.retry.support.RetryTemplate")
    @ConditionalOnMissingBean
    public LoadBalancedRetryListenerFactory loadBalancedRetryListenerFactory() {
        return new LoadBalancedRetryListenerFactory.DefaultRetryListenerFactory();
    }

    @Bean
    @ConditionalOnMissingBean
    public PropertiesFactory propertiesFactory() {
        return new PropertiesFactory();
    }

    @Bean
    @ConditionalOnProperty(value = "ribbon.eager-load.enabled", matchIfMissing = false)
    public RibbonApplicationContextInitializer ribbonApplicationContextInitializer() {
        return new RibbonApplicationContextInitializer(springClientFactory(),
                ribbonEagerLoadProperties.getClients());
    }

    @Configuration
    @ConditionalOnClass(HttpRequest.class)
    @ConditionalOnRibbonRestClient
    protected static class RibbonClientConfig {

        @Autowired
        private SpringClientFactory springClientFactory;

        @Bean
        public RestTemplateCustomizer restTemplateCustomizer(
                final RibbonClientHttpRequestFactory ribbonClientHttpRequestFactory) {
            return new RestTemplateCustomizer() {
                @Override
                public void customize(RestTemplate restTemplate) {
                    restTemplate.setRequestFactory(ribbonClientHttpRequestFactory);
                }
            };
        }

        @Bean
        public RibbonClientHttpRequestFactory ribbonClientHttpRequestFactory() {
            return new RibbonClientHttpRequestFactory(this.springClientFactory);
        }
    }

    //TODO: support for autoconfiguring restemplate to use apache http client or okhttp

    @Target({ ElementType.TYPE, ElementType.METHOD })
    @Retention(RetentionPolicy.RUNTIME)
    @Documented
    @Conditional(OnRibbonRestClientCondition.class)
    @interface ConditionalOnRibbonRestClient { }

    private static class OnRibbonRestClientCondition extends AnyNestedCondition {
        public OnRibbonRestClientCondition() {
            super(ConfigurationPhase.REGISTER_BEAN);
        }

        @Deprecated //remove in Edgware"
        @ConditionalOnProperty("ribbon.http.client.enabled")
        static class ZuulProperty {}

        @ConditionalOnProperty("ribbon.restclient.enabled")
        static class RibbonProperty {}
    }
}

注意这里的SpringClientFactory, ribbon默认情况下,每个eureka的serviceId(服务),都会分配自己独立的Spring的上下文,即ApplicationContext, 然后这个上下文中包含了必要的一些bean,比如: ILoadBalancerServerListFilter等。而Spring Cloud默认是使用RestTemplate封装了ribbon的调用,核心是通过一个拦截器:

@Bean
        @ConditionalOnMissingBean
        public RestTemplateCustomizer restTemplateCustomizer(
                final LoadBalancerInterceptor loadBalancerInterceptor) {
            return new RestTemplateCustomizer() {
                @Override
                public void customize(RestTemplate restTemplate) {
                    List<ClientHttpRequestInterceptor> list = new ArrayList<>(
                            restTemplate.getInterceptors());
                    list.add(loadBalancerInterceptor);
                    restTemplate.setInterceptors(list);
                }
            };
        }

因此核心是通过这个拦截器实现的负载均衡:

public class LoadBalancerInterceptor implements ClientHttpRequestInterceptor {

    private LoadBalancerClient loadBalancer;
    private LoadBalancerRequestFactory requestFactory;

    @Override
    public ClientHttpResponse intercept(final HttpRequest request, final byte[] body,
            final ClientHttpRequestExecution execution) throws IOException {
        final URI originalUri = request.getURI(); //这里传入的url是解析之前的,即http://serviceId/服务地址的形式
        String serviceName = originalUri.getHost(); //解析拿到对应的serviceId
        Assert.state(serviceName != null, "Request URI does not contain a valid hostname: " + originalUri);
        return this.loadBalancer.execute(serviceName, requestFactory.createRequest(request, body, execution));
    }
}

然后将请求转发给LoadBalancerClient:

public class RibbonLoadBalancerClient implements LoadBalancerClient {
@Override
    public <T> T execute(String serviceId, LoadBalancerRequest<T> request) throws IOException {
        ILoadBalancer loadBalancer = getLoadBalancer(serviceId); //获取对应的LoadBalancer
        Server server = getServer(loadBalancer); //获取服务器,这里会执行对应的分流策略,比如轮训
                //、随机等
        if (server == null) {
            throw new IllegalStateException("No instances available for " + serviceId);
        }
        RibbonServer ribbonServer = new RibbonServer(serviceId, server, isSecure(server,
                serviceId), serverIntrospector(serviceId).getMetadata(server));

        return execute(serviceId, ribbonServer, request);
    }
}

而这里的LoadBalancer是通过上文中提到的SpringClientFactory获取到的,这里会初始化一个新的Spring上下文,然后将Ribbon默认的配置类,比如说:RibbonAutoConfigurationRibbonEurekaAutoConfiguration等添加进去, 然后将当前spring的上下文设置为parent,再调用refresh方法进行初始化。

public class SpringClientFactory extends NamedContextFactory<RibbonClientSpecification> {
    protected AnnotationConfigApplicationContext createContext(String name) {
        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext();
        if (this.configurations.containsKey(name)) {
            for (Class<?> configuration : this.configurations.get(name)
                    .getConfiguration()) {
                context.register(configuration);
            }
        }
        for (Map.Entry<String, C> entry : this.configurations.entrySet()) {
            if (entry.getKey().startsWith("default.")) {
                for (Class<?> configuration : entry.getValue().getConfiguration()) {
                    context.register(configuration);
                }
            }
        }
        context.register(PropertyPlaceholderAutoConfiguration.class,
                this.defaultConfigType);
        context.getEnvironment().getPropertySources().addFirst(new MapPropertySource(
                this.propertySourceName,
                Collections.<String, Object> singletonMap(this.propertyName, name)));
        if (this.parent != null) {
            // Uses Environment from parent as well as beans
            context.setParent(this.parent);
        }
        context.refresh();
        return context;
    }
}

最核心的就在这一段,也就是说对于每一个不同的serviceId来说,都拥有一个独立的spring上下文,并且在第一次调用这个服务的时候,会初始化ribbon相关的所有bean, 如果不存在 才回去父context中去找。

再回到上文中根据分流策略获取实际的ip:port的代码段:

public class RibbonLoadBalancerClient implements LoadBalancerClient {
@Override
    public <T> T execute(String serviceId, LoadBalancerRequest<T> request) throws IOException {
        ILoadBalancer loadBalancer = getLoadBalancer(serviceId); //获取对应的LoadBalancer
        Server server = getServer(loadBalancer); //获取服务器,这里会执行对应的分流策略,比如轮训
                //、随机等
        if (server == null) {
            throw new IllegalStateException("No instances available for " + serviceId);
        }
        RibbonServer ribbonServer = new RibbonServer(serviceId, server, isSecure(server,
                serviceId), serverIntrospector(serviceId).getMetadata(server));

        return execute(serviceId, ribbonServer, request);
    }
}

    protected Server getServer(ILoadBalancer loadBalancer) {
        if (loadBalancer == null) {
            return null;
        }
                // 选择对应的服务器
        return loadBalancer.chooseServer("default"); // TODO: better handling of key
    }
public class ZoneAwareLoadBalancer<T extends Server> extends DynamicServerListLoadBalancer<T> {
@Override
    public Server chooseServer(Object key) {
        if (!ENABLED.get() || getLoadBalancerStats().getAvailableZones().size() <= 1) {
            logger.debug("Zone aware logic disabled or there is only one zone");
            return super.chooseServer(key); //默认不配置可用区,走的是这段
        }
        Server server = null;
        try {
            LoadBalancerStats lbStats = getLoadBalancerStats();
            Map<String, ZoneSnapshot> zoneSnapshot = ZoneAvoidanceRule.createSnapshot(lbStats);
            logger.debug("Zone snapshots: {}", zoneSnapshot);
            if (triggeringLoad == null) {
                triggeringLoad = DynamicPropertyFactory.getInstance().getDoubleProperty(
                        "ZoneAwareNIWSDiscoveryLoadBalancer." + this.getName() + ".triggeringLoadPerServerThreshold", 0.2d);
            }

            if (triggeringBlackoutPercentage == null) {
                triggeringBlackoutPercentage = DynamicPropertyFactory.getInstance().getDoubleProperty(
                        "ZoneAwareNIWSDiscoveryLoadBalancer." + this.getName() + ".avoidZoneWithBlackoutPercetage", 0.99999d);
            }
            Set<String> availableZones = ZoneAvoidanceRule.getAvailableZones(zoneSnapshot, triggeringLoad.get(), triggeringBlackoutPercentage.get());
            logger.debug("Available zones: {}", availableZones);
            if (availableZones != null &&  availableZones.size() < zoneSnapshot.keySet().size()) {
                String zone = ZoneAvoidanceRule.randomChooseZone(zoneSnapshot, availableZones);
                logger.debug("Zone chosen: {}", zone);
                if (zone != null) {
                    BaseLoadBalancer zoneLoadBalancer = getLoadBalancer(zone);
                    server = zoneLoadBalancer.chooseServer(key);
                }
            }
        } catch (Exception e) {
            logger.error("Error choosing server using zone aware logic for load balancer={}", name, e);
        }
        if (server != null) {
            return server;
        } else {
            logger.debug("Zone avoidance logic is not invoked.");
            return super.chooseServer(key);
        }
    }

    //实际走到的方法
    public Server chooseServer(Object key) {
        if (counter == null) {
            counter = createCounter();
        }
        counter.increment();
        if (rule == null) {
            return null;
        } else {
            try {
                return rule.choose(key);
            } catch (Exception e) {
                logger.warn("LoadBalancer [{}]:  Error choosing server for key {}", name, key, e);
                return null;
            }
        }
    }
}

也就是说最终会调用IRule选择到一个节点,这里支持很多策略,比如随机、轮训、响应时间权重等: image

public interface IRule{

    public Server choose(Object key);

    public void setLoadBalancer(ILoadBalancer lb);

    public ILoadBalancer getLoadBalancer();    
}

这里的LoadBalancer是在BaseLoadBalancer的构造器中设置的,上文说过,对于每一个serviceId服务来说,当第一次调用的时候会初始化对应的spring上下文,而这个上下文中包含了所有ribbon相关的bean,其中就包括ILoadBalancer、IRule。

原因

通过跟踪堆栈,发现不同的serviceId,IRule是同一个, 而上文说过,每个serviceId都拥有自己独立的上下文,包括独立的loadBalancer、IRule,而IRule是同一个,因此怀疑是这个bean是通过parent context获取到的,换句话说应用自己定义了一个这样的bean。查看代码果然如此。 这样就会导致一个问题,IRule是共享的,而其他bean是隔离开的,因此后面的serviceId初始化的时候,会修改这个IRule的LoadBalancer, 导致之前的服务获取到的实例信息是错误的,从而导致接口404。

public class BaseLoadBalancer extends AbstractLoadBalancer implements
        PrimeConnections.PrimeConnectionListener, IClientConfigAware {
    public BaseLoadBalancer() {
        this.name = DEFAULT_NAME;
        this.ping = null;
        setRule(DEFAULT_RULE);  // 这里会设置IRule的loadbalancer
        setupPingTask();
        lbStats = new LoadBalancerStats(DEFAULT_NAME);
    }
}

image

解决方案

解决方法也很简单,最简单就将这个自定义的IRule的bean干掉,另外更标准的做法是使用RibbonClients注解,具体做法可以参考文档。

总结

核心原因其实还是对于Spring Cloud的理解不够深刻,用法有错误,导致出现了一些比较诡异的问题。对于自己使用的组件、框架、甚至于每一个注解,都要了解其原理,能够清楚的说出这个注解有什么效果,有什么影响,而不是只着眼于解决眼前的问题。

Flag Counter

Tinkerc commented 5 years ago

你好,请教一下CustomHttpRequestInterceptor 怎么添加到 LoadBalancerInterceptor 之后

aCoder2013 commented 5 years ago

@Tinkerc 因为要获取到实际的IP,因此必须保证你添加的拦截器的代码在ribbon初始化的地方之后执行。

有几种方式,比如你可以监听Spring初始化完成事件,然后注入RestTemplate,把自己自定义的拦截器添加进去。

Tinkerc commented 5 years ago

@aCoder2013 嗯,我也是要获取实际IP,可以把关键源码贴出来一下吗,重新注入RestTemplate是否还需要单独设置@LoadBalanced

aCoder2013 commented 5 years ago

嗯,重新注入的是ribbon初始化完成之后的RestTemplate,只需要给RestTemplate再添加一个拦截器即可

    @Resource
    private RestTemplate restTemplate;

    private AtomicBoolean started = new AtomicBoolean(false);

    @Override
    public void onApplicationEvent(ContextRefreshedEvent event) {
        if (event != null) {
            if (started.compareAndSet(false, true)) {
                /*
                    Spring初始化完成后再执行
                 */
                List<ClientHttpRequestInterceptor> interceptors = restTemplate.getInterceptors();
                if (interceptors == null) {
                    interceptors = new ArrayList<>();
                }
                interceptors.add(new LogClientHttpRequestInterceptor());
                restTemplate.setInterceptors(interceptors);
            }
        }
    }
lzn312 commented 5 years ago

你好目前我也遇到了这个问题,在spring boot 1.5x集成ribbon和feign。思想是通过自定义规则实现某些特殊请求规则定义。但是和你记录的一样当请求返回时。rule的lb选择错误导致了404,自定义的规则不能去掉。请问第二种解法能详细说一说吗?

aCoder2013 commented 5 years ago

@lzn312 可以参考下官方的文档,用@ RibbonClients注解修饰你的配置类,这样会将这个配置类中的bean放在独立的spring上下文,而不是父context,https://cloud.spring.io/spring-cloud-netflix/multi/multi_spring-cloud-ribbon.html#_customizing_the_default_for_all_ribbon_clients

lzn312 commented 5 years ago

@aCoder2013 非常感谢你的回复,该问题已经通过配置@RibbonClients解决

broadvieweditor commented 5 years ago

博主您好,我是电子工业出版社博文视点的编辑,看到您发表的文章,觉得内容很好,您是否有兴趣出版图书呢:)

我的微信是472954195

Sunjb123 commented 5 years ago

你好,是要在@RibbonClients中再次配置IRule么

AKwang100 commented 4 years ago

非常感谢,终于找到一个花了心思的靠谱解释

aCoder2013 commented 4 years ago

@AKwang100 很高兴能帮到你

Veklip commented 4 years ago

你好,我的微信是Tnsg_ui_lip,想请教一下你这个问题,我也遇到这个情况,但是我们没有定义IRule,我暂时加了discoverClient#getInstances打印某个服务所有的uri

NelsonFeng commented 4 years ago

@aCoder2013 非常感谢你的回复,该问题已经通过配置@RibbonClients解决

麻烦问下如何解决的哇我用了@RibbonClients但是还是404