brooklyncentral / brooklyn

This project has moved and is now part of the ASF
https://github.com/apache/incubator-brooklyn
72 stars 27 forks source link

shouldn't back up HA dir when standby node starts #1464

Closed ahgittin closed 10 years ago

ahgittin commented 10 years ago

there is no need to create a backup directory of the persisted state when starting a node starts in standby node. this causes unsightly errors in the log and can even prevent the standby node from starting. so we should delay the backup until such time as a new master is elected and obtains right access.

@aledsage @richardcloudsoft i think you've looked at this and recommended a workaround, but have you worked on a fix?

note that currently we get intermitted test failures such as this:

Tests run: 19, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.573 sec <<< FAILURE!
testStandbyTakesOverWhenPrimaryTerminatedGracefully(brooklyn.launcher.BrooklynLauncherHighAvailabilityTest)  Time elapsed: 0.348 sec  <<< FAILURE!
brooklyn.util.exceptions.FatalConfigurationRuntimeException: Error backing up persistence directory /var/folders/q2/363yynwx5lb_qpch1km2xvr80000gn/T/1402379596446-0
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.commons.io.FileUtils.doCopyFile(FileUtils.java:1138)
at org.apache.commons.io.FileUtils.doCopyDirectory(FileUtils.java:1428)
at org.apache.commons.io.FileUtils.doCopyDirectory(FileUtils.java:1426)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1389)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1261)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1230)
at brooklyn.entity.rebind.persister.FileBasedObjectStore.copyDir(FileBasedObjectStore.java:272)
at brooklyn.entity.rebind.persister.FileBasedObjectStore.backupDirByCopying(FileBasedObjectStore.java:223)
at brooklyn.entity.rebind.persister.FileBasedObjectStore.prepareForUse(FileBasedObjectStore.java:171)
at brooklyn.launcher.BrooklynLauncher.initPersistence(BrooklynLauncher.java:485)
at brooklyn.launcher.BrooklynLauncher.start(BrooklynLauncher.java:423)
at brooklyn.launcher.BrooklynLauncherHighAvailabilityTest.doTestStandbyTakesOver(BrooklynLauncherHighAvailabilityTest.java:93)
at brooklyn.launcher.BrooklynLauncherHighAvailabilityTest.testStandbyTakesOverWhenPrimaryTerminatedGracefully(BrooklynLauncherHighAvailabilityTest.java:58)
richardcloudsoft commented 10 years ago

I added a workaround in e5a893a427043a9b093b24742ca45d50fa81deb2 (not yet in master) - no permanent fix yet.

ahgittin commented 10 years ago

where is this commit? would be nice to have in master.

(and how do i tell with git where this commit sits??)

we should also use this to fix intermittent failing tests such as in https://github.com/brooklyncentral/brooklyn/pull/1473 by applying to secondary and tertiary in BrooklynLauncherHighAvailabilityTest

ahgittin commented 10 years ago

i think this is fixed (properly) by #1477, so no need for the workaround mentioned above

@richardcloudsoft @aledsage wdyt?