Closed amcp closed 7 years ago
@amcp I'll start looking at the test cases and see how we can split them logically and keep each build under the Travis CI 50min time limit.
Below is a check list for all tasks needed to fix the continuous integration build.
For reference, thread with previous discussion: https://github.com/awslabs/dynamodb-titan-storage-backend/pull/82
Multi test time taken:
$ cat target/failsafe-reports/*txt | grep -v "\-\-" | grep -v "Test\ set:"
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.838 sec - in com.amazon.titan.ClientTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in com.amazon.titan.DynamoDBStoreTransactionTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.738 sec - in com.amazon.titan.MultiGraphOfTheGodsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.096 sec - in com.amazon.titan.MultiMarvelTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.84 sec - in com.amazon.titan.SingleGraphOfTheGodsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.53 sec - in com.amazon.titan.SingleMarvelTest
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec - in com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegateTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 323.809 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBIDAuthorityTest
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.152 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBLockStoreTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 123.055 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBLogTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 417.726 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBMultiWriteStoreTest
Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2,123.533 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBStoreTest
Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3,801.508 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBGraphTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 872.628 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBOLAPTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 264.507 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBPartitionGraphTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.004 sec - in com.google.common.util.concurrent.RateLimiterCreatorTest
Single taken time:
$ cat target/failsafe-reports/*txt | grep -v "\-\-" | grep -v "Test\ set:"
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.52 sec - in com.amazon.titan.ClientTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in com.amazon.titan.DynamoDBStoreTransactionTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.898 sec - in com.amazon.titan.MultiGraphOfTheGodsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.777 sec - in com.amazon.titan.MultiMarvelTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.854 sec - in com.amazon.titan.SingleGraphOfTheGodsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.498 sec - in com.amazon.titan.SingleMarvelTest
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegateTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 324.047 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBIDAuthorityTest
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.985 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBLockStoreTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 122.984 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBLogTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 382.358 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBMultiWriteStoreTest
Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,862.807 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBStoreTest
Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,339.93 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBGraphTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 752.405 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBOLAPTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 201.195 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBPartitionGraphTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.663 sec - in com.google.common.util.concurrent.RateLimiterCreatorTest
Multi heavy hitters: Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2,123.533 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3,801.508 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 872.628 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBOLAPTest
Single heavy hitters: Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,862.807 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,339.93 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 752.405 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBOLAPTest
Isolate in separate matrix entries:
MultiDynamoDBGraphTest will need some splitting up by method, one moment.
Isolate each of the following in separate matrix entities
@F2006 Now that I fixed the bug in AbstractDynamoDBLockStoreTest, there are no test failures.
Updated task list (previous discussion in #82):
@amcp thanks for adding the run times for all the tests. I have broken up the matrix config as per above. I have also added SingleDynamoDBMultiWriteStoreTest and MultiDynamoDBMultiWriteStoreTest to the matrix list. I have also removed the allow_failures
section from the Travis CI config, I feel that if any of the builds fail, the overall status of the build should also be marked as failed.
We have some green builds now. :) https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/225778440
The build for SingleDynamoDBGraphTest module is failing due to tests failing. https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/jobs/225778448
Failed tests:
SingleDynamoDBGraphTest>TitanGraphTest.simpleLogTest:3538->TitanGraphTest.simpleLogTest:3677 expected:<5> but was:<0>
SingleDynamoDBGraphTest>TitanGraphTest.simpleLogTestWithFailure:3543->TitanGraphTest.simpleLogTest:3677 expected:<5> but was:<0>
Tests in error:
SingleDynamoDBGraphTest>TitanGraphTest.testEdgesExceedCacheSize:3307 » Titan C...
The following builds are terminated by the no console log output in the 10min timeframe. We might need to split them by method as well.
Do you want to create a feature branch to open the pull request against or do you want to use the 1.0.0 branch?
I'll continue tomorrow to spilt the last three long-running builds by method and get the last three builds to run successfully, if there is time left in the day I can also have a look at the failing tests.
Updated check list:
@F2006 Here is the feature branch: https://github.com/awslabs/dynamodb-titan-storage-backend/tree/travis . You can open up a PR to merge to that branch and I will merge it.
@F2006 I think this is what you want to make the PR for: https://github.com/awslabs/dynamodb-titan-storage-backend/compare/1.0.0...nichestreem:travis_ci_build_fixes
testConcurrentGetSlice and testConcurrentGetSliceAndMutate are the heavy hitters in the StoreTests. Isolating them
@F2006 My progress for all the checked off items is in PR #94
Latest build: https://travis-ci.org/awslabs/dynamodb-titan-storage-backend/builds/225949737
10 minute timeout failures (add a print statement every minute to prevent test abortion):
Failed for other reasons (first one took 21 minutes and the second one took 19 minutes):
@F2006 Instead of using metric console output, could we use a simple log statement? Metric summaries make the logs quite long.
@amcp thanks for creating the pull request and merging in. Just catching up with all the changes now. :)
I agree with removing the metrics, that was the only quick way I could see to keep the build from being terminated. I had a look at the various logs being printed from Titan, ElasticSearch and the AWS libs, there was no sufficient logging on any level for any of the libs to keep the build from being terminated.
The only other way would be to have something custom adding some form of heart beat message to the console every x minutes. There should also be some form of timeout on the heart beat message, in the case where a test is actually hung.
This is perhaps something I can pick up today, remove the metrics, print a heart beat and get the 10min timeout failures to progress?
@amcp I have added a heartbeat to output logging to the console every x configured time. It will also include the current test name and the current execution time for the test, could be helpful for debugging.
I have to be at an event after work, but will be back online later tonight when I get back home. There is a build running (https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/226171391) If the build looks fine, I'll create a pull request later to the travis branch. I have also updated my branch from your isolateStoreGraphTests branch, so the pull request will include all of our updates.
I have updated our check lists with the items left which I have seen up to now, will update the list when the build is done tonight.
50 minute build timeout failures:
Failed for other reasons:
Failed unit tests:
Updated check list:
@F2006 can you squash all of your commits on travis_ci_build_fixes branch and rebase off the isolateStoreGraphTests branch as to minimize the number of diff lines I need to read when making the next merge PR?
@amcp I have rebased my travis_ci_build_fixes branch with the isolateStoreGraphTests this morning. I have squashed all commits from today and created a pull request to the travis branch: https://github.com/awslabs/dynamodb-titan-storage-backend/pull/95
Looking at the last build (https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/226171391), two of the tests are terminated due to the 50min build limit. These tests are already sliced, will think of a plan forward in the morning, unless you have any ideas? :)
There are maintenance currently being done on Travis CI and I cannot see the log for MultiStoreTest. Will investigate in the morning.
I have updated the lists above.
@F2006 the PR was based of awslabs:travis branch so please create a new PR based of awslabs: isolateStoreGraphTests . It will greatly reduce the diff output. Ideally your PR should only have one commit and it should be right after the HEAD of awslabs:isolateStoreGraphTests.
@F2006 For multi testConcurrentGetSliceAndMutate and testConcurrentGetSlice we will have to override the tests and make them smaller time-wise. I can do that.
@F2006
git fetch upstream git checkout 1.0.0 git rebase upstream/1.0.0 git push -f git checkout travis_ci_build_fixes git rebase origin/1.0.0
git rebase -i HEAD~10 git push -f
git rebase upstream/isolateStoreGraphTests git push -f
git push :travis_ci
pushed the fix for multi testConcurrent* in https://github.com/awslabs/dynamodb-titan-storage-backend/commit/49be9feffd1acbd08437e04298b0189b442b65de
Here are the most recent failures in the latest build:
I added some more fixes that reduce test time. I will waive passes on MultiStoreTest, SingleGraphTest and MultiGraphTest as I know that these pass when run on a different host class.
closing and replacing with #96, #97, #98 and #99
@amcp pull request made to the upstream travis branch with the heartbeat console output. The diffs now shows only all the changes to add the heartbeat, so it should be easier to read. :)
@F2006 here is the tracking issue for the build