Netflix / archaius

Library for configuration management API
Apache License 2.0
2.47k stars 486 forks source link

Deadlock when invoking ConfigurationManager.getConfigInstance() from different threads #562

Closed anha1 closed 1 year ago

anha1 commented 6 years ago

Affected version: archaius-core v0.7.6

Minimal example to reproduce the issue:

package com.example;

import com.netflix.config.ConfigurationManager;
import com.netflix.config.DynamicPropertyFactory;

public class App {

    public static void main( String[] args ) throws Exception {

        Thread a = new Thread(() -> {
            System.out.println("A1");
            ConfigurationManager.getConfigInstance();
            System.out.println("A2");
        });

        Thread b = new Thread(() -> {
            System.out.println("B1");
            DynamicPropertyFactory.getInstance();
            System.out.println("B2");
        });

        a.setName("A");
        b.setName("B");

        a.start();
        //Thread.sleep(1000);
        b.start();

        a.join();
        b.join();

        System.out.println("DONE");
    }
}

In ~50% of runs, the application hangs, output:

A1
B1

Thread dump:

"B" #12 prio=5 os_prio=0 tid=0x00007f6ac443b800 nid=0x758 in Object.wait() [0x00007f6aac465000]
   java.lang.Thread.State: RUNNABLE
    at com.netflix.config.DynamicPropertyFactory.getInstance(DynamicPropertyFactory.java:277)
    - locked <0x00000000ff390208> (a java.lang.Class for com.netflix.config.ConfigurationManager)
    at com.example.App.lambda$main$1(App.java:18)
    at com.example.App$$Lambda$2/51228289.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

"A" #11 prio=5 os_prio=0 tid=0x00007f6ac443a000 nid=0x757 waiting for monitor entry [0x00007f6aac565000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.netflix.config.ConfigurationManager.getConfigInstance(ConfigurationManager.java:195)
    - waiting to lock <0x00000000ff390208> (a java.lang.Class for com.netflix.config.ConfigurationManager)
    at com.netflix.config.ConfigurationBasedDeploymentContext.<init>(ConfigurationBasedDeploymentContext.java:108)
    at com.netflix.config.ConfigurationManager.<clinit>(ConfigurationManager.java:116)
    at com.example.App.lambda$main$0(App.java:12)
    at com.example.App$$Lambda$1/20132171.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

If uncommenting "Thread.sleep(1000)", everything works as expected:

A1
A2
B1
B2
DONE

This deadlock is taken from a real-world Spring Boot 2 application with a casual Zipkin setup, here are thread dumps from this Spring Boot 2 + Zipkin application (but the issue looks exactly the same as in the minimal example provided):

"AsyncReporter{org.springframework.cloud.sleuth.zipkin2.sender.RestTemplateSender@7b4185ea}" #21 daemon prio=5 os_prio=0 tid=0x00007f2e4e923800 nid=0x64b8 in Object.wait() [0x00007f2dcaa03000]
   java.lang.Thread.State: RUNNABLE
    at com.netflix.config.DynamicPropertyFactory.getInstance(DynamicPropertyFactory.java:277)
    - locked <0x000000064c7c9320> (a java.lang.Class for com.netflix.config.ConfigurationManager)
    at org.springframework.cloud.netflix.ribbon.RibbonUtils.getProperty(RibbonUtils.java:59)
    at org.springframework.cloud.netflix.ribbon.RibbonUtils.setRibbonProperty(RibbonUtils.java:48)
    at org.springframework.cloud.netflix.ribbon.RibbonClientConfiguration.preprocess(RibbonClientConfiguration.java:174)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:365)
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:308)
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:135)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:424)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1700)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:581)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:503)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:317)
    at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$148/320945958.getObject(Unknown Source)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    - locked <0x000000064c973878> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:315)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:760)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:869)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550)
    - locked <0x000000064c973a18> (a java.lang.Object)
    at org.springframework.cloud.context.named.NamedContextFactory.createContext(NamedContextFactory.java:117)
    at org.springframework.cloud.context.named.NamedContextFactory.getContext(NamedContextFactory.java:85)
    - locked <0x000000064c4886c0> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.cloud.netflix.ribbon.SpringClientFactory.getContext(SpringClientFactory.java:118)
    at org.springframework.cloud.context.named.NamedContextFactory.getInstance(NamedContextFactory.java:126)
    at org.springframework.cloud.netflix.ribbon.SpringClientFactory.getInstance(SpringClientFactory.java:108)
    at org.springframework.cloud.netflix.ribbon.SpringClientFactory.getLoadBalancer(SpringClientFactory.java:57)
    at org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.getLoadBalancer(RibbonLoadBalancerClient.java:155)
    at org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.getServer(RibbonLoadBalancerClient.java:144)
    at org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.choose(RibbonLoadBalancerClient.java:76)
    at org.springframework.cloud.sleuth.zipkin2.sender.LoadBalancerClientZipkinLoadBalancer.instance(LoadBalancerClientZipkinLoadBalancer.java:42)
    at org.springframework.cloud.sleuth.zipkin2.sender.ZipkinRestTemplateSenderConfiguration$1.zipkinUrl(ZipkinRestTemplateSenderConfiguration.java:104)
    at org.springframework.cloud.sleuth.zipkin2.sender.ZipkinRestTemplateWrapper.doExecute(ZipkinRestTemplateSenderConfiguration.java:130)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:658)
    at org.springframework.cloud.sleuth.zipkin2.sender.RestTemplateSender.post(RestTemplateSender.java:112)
    at org.springframework.cloud.sleuth.zipkin2.sender.RestTemplateSender$HttpPostCall.doExecute(RestTemplateSender.java:123)
    at org.springframework.cloud.sleuth.zipkin2.sender.RestTemplateSender$HttpPostCall.doExecute(RestTemplateSender.java:115)
    at zipkin2.Call$Base.execute(Call.java:379)
    at zipkin2.reporter.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:286)
    at zipkin2.reporter.AsyncReporter$Builder$1.run(AsyncReporter.java:190)

   Locked ownable synchronizers:
    - None

"main" #1 prio=5 os_prio=0 tid=0x00007f2e4c00d000 nid=0x63e2 waiting for monitor entry [0x00007f2e52a25000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.netflix.config.ConfigurationManager.getConfigInstance(ConfigurationManager.java:175)
    - waiting to lock <0x000000064c7c9320> (a java.lang.Class for com.netflix.config.ConfigurationManager)
    at com.netflix.config.ConfigurationBasedDeploymentContext.<init>(ConfigurationBasedDeploymentContext.java:108)
    at com.netflix.config.ConfigurationManager.<clinit>(ConfigurationManager.java:104)
    at org.springframework.cloud.netflix.archaius.ArchaiusAutoConfiguration.addArchaiusConfiguration(ArchaiusAutoConfiguration.java:200)
    at org.springframework.cloud.netflix.archaius.ArchaiusAutoConfiguration.configureArchaius(ArchaiusAutoConfiguration.java:190)
    at org.springframework.cloud.netflix.archaius.ArchaiusAutoConfiguration.configurableEnvironmentConfiguration(ArchaiusAutoConfiguration.java:106)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:582)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1256)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1105)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:543)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:503)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:317)
    at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$148/320945958.getObject(Unknown Source)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    - locked <0x000000064ad22a30> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:315)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
    at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:251)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.addCandidateEntry(DefaultListableBeanFactory.java:1325)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1291)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveMultipleBeans(DefaultListableBeanFactory.java:1193)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1096)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1065)
    at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:584)
    at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:91)
    at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:373)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1350)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:580)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:503)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:317)
    at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$148/320945958.getObject(Unknown Source)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    - locked <0x000000064ad22a30> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:315)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:760)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:869)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550)
    - locked <0x000000064ad21ea0> (a java.lang.Object)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:759)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:395)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:327)
    at org.springframework.boot.test.context.SpringBootContextLoader.loadContext(SpringBootContextLoader.java:139)
    at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContextInternal(DefaultCacheAwareContextLoaderDelegate.java:99)
    at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:117)
    - locked <0x000000064a62b1d0> (a org.springframework.test.context.cache.DefaultContextCache)
    at org.springframework.test.context.MergedContextConfiguration.getParentApplicationContext(MergedContextConfiguration.java:399)
    at org.springframework.boot.test.context.SpringBootContextLoader.getInitializers(SpringBootContextLoader.java:219)
    at org.springframework.boot.test.context.SpringBootContextLoader.loadContext(SpringBootContextLoader.java:121)
    at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContextInternal(DefaultCacheAwareContextLoaderDelegate.java:99)
    at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:117)
    - locked <0x000000064a62b1d0> (a org.springframework.test.context.cache.DefaultContextCache)
    at org.springframework.test.context.support.DefaultTestContext.getApplicationContext(DefaultTestContext.java:108)
    at org.springframework.test.context.support.DependencyInjectionTestExecutionListener.injectDependencies(DependencyInjectionTestExecutionListener.java:117)
    at org.springframework.test.context.support.DependencyInjectionTestExecutionListener.prepareTestInstance(DependencyInjectionTestExecutionListener.java:83)
    at org.springframework.test.context.TestContextManager.prepareTestInstance(TestContextManager.java:246)
    at org.springframework.test.context.testng.AbstractTestNGSpringContextTests.springTestContextPrepareTestInstance(AbstractTestNGSpringContextTests.java:145)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
    at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:515)
    at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:217)
    at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:144)
    at org.testng.internal.TestMethodWorker.invokeBeforeClassMethods(TestMethodWorker.java:169)
    - locked <0x000000064a61d970> (a org.testng.TestClass)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
    at org.testng.TestRunner.privateRun(TestRunner.java:756)
    at org.testng.TestRunner.run(TestRunner.java:610)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:387)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:382)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
    at org.testng.SuiteRunner.run(SuiteRunner.java:289)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1293)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1218)
    at org.testng.TestNG.runSuites(TestNG.java:1133)
    at org.testng.TestNG.run(TestNG.java:1104)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:132)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:224)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:113)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:286)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:240)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)

   Locked ownable synchronizers:
    - None

My assumption is that the issue is caused by classic deadlock conditions - two locks are acquired in different order by two threads. Obviously, the first lock is synchronized (ConfigurationManager.class). And the second lock is an internal lock associated with the static initializer of ConfigurationManager.

davidcarltonsumo commented 5 years ago

We're seeing this, too, and my diagnosis (before I found this bug) was the same. https://stackoverflow.com/questions/4612492/static-initializer-and-static-synchronized-method-locking-issue I think talks a little bit about the locks there (in the second edit to the accepted answer) - even though one of the threads is runnable, it can't make progress because of the internal lock you mentioned.

rgallardo-netflix commented 1 year ago

V1 is no longer maintained. V2 bypasses this kind of issue by avoiding the use of statics.