Terracotta-OSS / galvan

An integration testing framework.
Apache License 2.0
1 stars 16 forks source link

Galvan Port Chooser is failing on Azure / Windows #215

Closed mathieucarbou closed 4 years ago

mathieucarbou commented 4 years ago

Hi guys,

PR https://github.com/Terracotta-OSS/terracotta-platform/pull/645 is blocked: If you look at the build, the windows test are not passing because the build is stopped by Azure. There are a lot of timeouts like these:

2020-04-07T00:57:22.2270298Z org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
2020-04-07T00:57:22.2328452Z    at com.tc.util.PortChooser.getNonEphemeralPort(PortChooser.java:155)
2020-04-07T00:57:22.2379838Z    at com.tc.util.PortChooser.choose(PortChooser.java:143)
2020-04-07T00:57:22.2446821Z    at com.tc.util.PortChooser.chooseRandomPorts(PortChooser.java:69)
2020-04-07T00:57:22.2449139Z    at org.terracotta.testing.rules.BasicExternalCluster.internalStart(BasicExternalCluster.java:169)
2020-04-07T00:57:22.2474226Z    at org.terracotta.testing.rules.BasicExternalCluster.before(BasicExternalCluster.java:142)
2020-04-07T00:57:22.2477451Z    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
2020-04-07T00:57:22.2486483Z    at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
2020-04-07T00:57:22.2494591Z    at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
2020-04-07T00:57:22.2511092Z    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2020-04-07T00:57:22.2527550Z    at java.lang.Thread.run(Thread.java:748)

link: https://dev.azure.com/TerracottaCI/terracotta/_build/results?buildId=3772&view=logs&j=cb55e55a-edd2-5df7-6d62-766be1c1d824&t=da40a98e-93df-5bfb-2b85-e8d484631d78

Here is a console dump if the link is not working: console.txt

Looking at the timestemps in the logs, there are a lot of minutes wasted:

2020-04-07T00:57:22.2721677Z Using kitInstallationPath: "D:\a\1\s\management\testing\integration-tests\target/platform-kit-5.7-SNAPSHOT"
2020-04-07T00:58:22.2314799Z Using kitInstallationPath: "D:\a\1\s\management\testing\integration-tests\target/platform-kit-5.7-SNAPSHOT"
2020-04-07T00:59:22.2447990Z Using kitInstallationPath: "D:\a\1\s\management\testing\integration-tests\target/platform-kit-5.7-SNAPSHOT"
2020-04-07T01:00:22.2776467Z Using kitInstallationPath: "D:\a\1\s\management\testing\integration-tests\target/platform-kit-5.7-SNAPSHOT"
2020-04-07T01:01:22.3984726Z [ERROR] Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 240.061 s <<< FAILURE! - in org.terracotta.management.integration.tests.ClientCacheRemoteManagementIT
2020-04-07T01:01:22.3999833Z [ERROR] can_access_remote_management_registry_of_client(org.terracotta.management.integration.tests.ClientCacheRemoteManagementIT)  Time elapsed: 60.018 s  <<< ERROR!
2020-04-07T01:01:22.4066546Z org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
2020-04-07T01:01:22.4071164Z    at com.tc.util.PortChooser.chooseRandomPorts(PortChooser.java:69)
2020-04-07T01:01:22.4073367Z    at org.terracotta.testing.rules.BasicExternalCluster.internalStart(BasicExternalCluster.java:169)
2020-04-07T01:01:22.4075437Z    at org.terracotta.testing.rules.BasicExternalCluster.before(BasicExternalCluster.java:142)
2020-04-07T01:01:22.4076598Z    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
2020-04-07T01:01:22.4077452Z    at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
2020-04-07T01:01:22.4078425Z    at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
2020-04-07T01:01:22.4079228Z    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2020-04-07T01:01:22.4080172Z    at java.lang.Thread.run(Thread.java:748)
saurabhagas commented 4 years ago

If PortChooser happens to be the problem, we could use the new port chooser. That's currently in tc-platform, and would need to be moved to terracotta-utilities repository though.

mathieucarbou commented 4 years ago

If PortChooser happens to be the problem, we could use the new port chooser. That's currently in tc-platform, and would need to be moved to terracotta-utilities repository though.

agree. but if this is the solution, I would expect someone else to do it other than you or me. We need help here so that you can focus on your tasks.

@AbfrmBlr : FYI. Can you look if we can get help for that... Thanks!

akomakom commented 4 years ago

I tested port allocation on windows and linux using this code I slapped together.
It works fine.

I can change the port range if you like (this allocated and listened on 10000 ports simultaneously)

mathieucarbou commented 4 years ago

I tested port allocation on windows and linux using this code I slapped together. It works fine.

I can change the port range if you like (this allocated and listened on 10000 ports simultaneously)

I don't think you are testing the same behavior as in the port chooser. The port chooser using some randomization and exclude list which can make things worse and there are a a bunch of port open / close / open / close so time wait on the VM I guess.

akomakom commented 4 years ago

I don't think you are testing the same behavior as in the port chooser. The port chooser using some randomization and exclude list which can make things worse and there are a a bunch of port open / close / open / close so time wait on the VM I guess.

Feel free to contribute a better test.

akomakom commented 4 years ago

Changed the ephemeral port range on our Azure agents from 1024- to 49152-

ramsai1729 commented 4 years ago

Issue was due to ephemeral port range on build system, not related to galvan, closing this.