amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
446 stars 99 forks source link

Fix Travis CI build #87

Closed amcp closed 7 years ago

amcp commented 7 years ago

@F2006 here is the tracking issue for the build

F2006 commented 7 years ago

@amcp I'll start looking at the test cases and see how we can split them logically and keep each build under the Travis CI 50min time limit.

F2006 commented 7 years ago

Below is a check list for all tasks needed to fix the continuous integration build.

For reference, thread with previous discussion: https://github.com/awslabs/dynamodb-titan-storage-backend/pull/82

amcp commented 7 years ago

Multi test time taken: $ cat target/failsafe-reports/*txt | grep -v "\-\-" | grep -v "Test\ set:" Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.838 sec - in com.amazon.titan.ClientTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in com.amazon.titan.DynamoDBStoreTransactionTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.738 sec - in com.amazon.titan.MultiGraphOfTheGodsTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.096 sec - in com.amazon.titan.MultiMarvelTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.84 sec - in com.amazon.titan.SingleGraphOfTheGodsTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.53 sec - in com.amazon.titan.SingleMarvelTest Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec - in com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegateTest Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 323.809 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBIDAuthorityTest Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.152 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBLockStoreTest Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 123.055 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBLogTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 417.726 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBMultiWriteStoreTest Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2,123.533 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3,801.508 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 872.628 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBOLAPTest Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 264.507 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBPartitionGraphTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.004 sec - in com.google.common.util.concurrent.RateLimiterCreatorTest

amcp commented 7 years ago

Single taken time: $ cat target/failsafe-reports/*txt | grep -v "\-\-" | grep -v "Test\ set:" Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.52 sec - in com.amazon.titan.ClientTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in com.amazon.titan.DynamoDBStoreTransactionTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.898 sec - in com.amazon.titan.MultiGraphOfTheGodsTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.777 sec - in com.amazon.titan.MultiMarvelTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.854 sec - in com.amazon.titan.SingleGraphOfTheGodsTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.498 sec - in com.amazon.titan.SingleMarvelTest Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegateTest Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 324.047 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBIDAuthorityTest Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.985 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBLockStoreTest Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 122.984 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBLogTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 382.358 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBMultiWriteStoreTest Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,862.807 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,339.93 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 752.405 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBOLAPTest Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 201.195 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBPartitionGraphTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.663 sec - in com.google.common.util.concurrent.RateLimiterCreatorTest

amcp commented 7 years ago

Multi heavy hitters: Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2,123.533 sec - in com.amazon.titan.diskstorage.dynamodb.MultiDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3,801.508 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 872.628 sec - in com.amazon.titan.graphdb.dynamodb.MultiDynamoDBOLAPTest

Single heavy hitters: Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,862.807 sec - in com.amazon.titan.diskstorage.dynamodb.SingleDynamoDBStoreTest Tests run: 59, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,339.93 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBGraphTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 752.405 sec - in com.amazon.titan.graphdb.dynamodb.SingleDynamoDBOLAPTest

amcp commented 7 years ago

Isolate in separate matrix entries:

MultiDynamoDBGraphTest will need some splitting up by method, one moment.

amcp commented 7 years ago

Isolate each of the following in separate matrix entities

amcp commented 7 years ago

@F2006 Now that I fixed the bug in AbstractDynamoDBLockStoreTest, there are no test failures.

amcp commented 7 years ago

Updated task list (previous discussion in #82):

F2006 commented 7 years ago

@amcp thanks for adding the run times for all the tests. I have broken up the matrix config as per above. I have also added SingleDynamoDBMultiWriteStoreTest and MultiDynamoDBMultiWriteStoreTest to the matrix list. I have also removed the allow_failures section from the Travis CI config, I feel that if any of the builds fail, the overall status of the build should also be marked as failed.

We have some green builds now. :) https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/225778440

The build for SingleDynamoDBGraphTest module is failing due to tests failing. https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/jobs/225778448

Failed tests: 
  SingleDynamoDBGraphTest>TitanGraphTest.simpleLogTest:3538->TitanGraphTest.simpleLogTest:3677 expected:<5> but was:<0>
  SingleDynamoDBGraphTest>TitanGraphTest.simpleLogTestWithFailure:3543->TitanGraphTest.simpleLogTest:3677 expected:<5> but was:<0>
Tests in error: 
  SingleDynamoDBGraphTest>TitanGraphTest.testEdgesExceedCacheSize:3307 » Titan C...

The following builds are terminated by the no console log output in the 10min timeframe. We might need to split them by method as well.

Do you want to create a feature branch to open the pull request against or do you want to use the 1.0.0 branch?

I'll continue tomorrow to spilt the last three long-running builds by method and get the last three builds to run successfully, if there is time left in the day I can also have a look at the failing tests.

F2006 commented 7 years ago

Updated check list:

amcp commented 7 years ago

@F2006 Here is the feature branch: https://github.com/awslabs/dynamodb-titan-storage-backend/tree/travis . You can open up a PR to merge to that branch and I will merge it.

amcp commented 7 years ago

@F2006 I think this is what you want to make the PR for: https://github.com/awslabs/dynamodb-titan-storage-backend/compare/1.0.0...nichestreem:travis_ci_build_fixes

amcp commented 7 years ago

testConcurrentGetSlice and testConcurrentGetSliceAndMutate are the heavy hitters in the StoreTests. Isolating them

amcp commented 7 years ago

@F2006 My progress for all the checked off items is in PR #94

amcp commented 7 years ago

Latest build: https://travis-ci.org/awslabs/dynamodb-titan-storage-backend/builds/225949737

10 minute timeout failures (add a print statement every minute to prevent test abortion):

Failed for other reasons (first one took 21 minutes and the second one took 19 minutes):

amcp commented 7 years ago

@F2006 Instead of using metric console output, could we use a simple log statement? Metric summaries make the logs quite long.

F2006 commented 7 years ago

@amcp thanks for creating the pull request and merging in. Just catching up with all the changes now. :)

I agree with removing the metrics, that was the only quick way I could see to keep the build from being terminated. I had a look at the various logs being printed from Titan, ElasticSearch and the AWS libs, there was no sufficient logging on any level for any of the libs to keep the build from being terminated.

The only other way would be to have something custom adding some form of heart beat message to the console every x minutes. There should also be some form of timeout on the heart beat message, in the case where a test is actually hung.

This is perhaps something I can pick up today, remove the metrics, print a heart beat and get the 10min timeout failures to progress?

F2006 commented 7 years ago

@amcp I have added a heartbeat to output logging to the console every x configured time. It will also include the current test name and the current execution time for the test, could be helpful for debugging.

I have to be at an event after work, but will be back online later tonight when I get back home. There is a build running (https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/226171391) If the build looks fine, I'll create a pull request later to the travis branch. I have also updated my branch from your isolateStoreGraphTests branch, so the pull request will include all of our updates.

I have updated our check lists with the items left which I have seen up to now, will update the list when the build is done tonight.

50 minute build timeout failures:

Failed for other reasons:

Failed unit tests:

Updated check list:

amcp commented 7 years ago

@F2006 can you squash all of your commits on travis_ci_build_fixes branch and rebase off the isolateStoreGraphTests branch as to minimize the number of diff lines I need to read when making the next merge PR?

F2006 commented 7 years ago

@amcp I have rebased my travis_ci_build_fixes branch with the isolateStoreGraphTests this morning. I have squashed all commits from today and created a pull request to the travis branch: https://github.com/awslabs/dynamodb-titan-storage-backend/pull/95

Looking at the last build (https://travis-ci.org/nichestreem/dynamodb-titan-storage-backend/builds/226171391), two of the tests are terminated due to the 50min build limit. These tests are already sliced, will think of a plan forward in the morning, unless you have any ideas? :)

There are maintenance currently being done on Travis CI and I cannot see the log for MultiStoreTest. Will investigate in the morning.

I have updated the lists above.

amcp commented 7 years ago

@F2006 the PR was based of awslabs:travis branch so please create a new PR based of awslabs: isolateStoreGraphTests . It will greatly reduce the diff output. Ideally your PR should only have one commit and it should be right after the HEAD of awslabs:isolateStoreGraphTests.

amcp commented 7 years ago

@F2006 For multi testConcurrentGetSliceAndMutate and testConcurrentGetSlice we will have to override the tests and make them smaller time-wise. I can do that.

amcp commented 7 years ago

@F2006

on your fork do the following

git fetch upstream git checkout 1.0.0 git rebase upstream/1.0.0 git push -f git checkout travis_ci_build_fixes git rebase origin/1.0.0

after travis_ci_build_fixes is rebased on your fork's origin/1.0.0, squash all the commits

keep the first hash in the list that comes up and use s for squash on the rest of the commits.

git rebase -i HEAD~10 git push -f

then, rebase travis_ci_build fixes on top of isolateStoreGraphTests

git rebase upstream/isolateStoreGraphTests git push -f

finally create a PR with a base of awslabs:isolateStoreGraphTests and comparing nichestreem:travis_ci_build_fixes

to avoid confusion and to make the network graph clearer, you might want to get rid of your travis_ci branch

git push :travis_ci

amcp commented 7 years ago

pushed the fix for multi testConcurrent* in https://github.com/awslabs/dynamodb-titan-storage-backend/commit/49be9feffd1acbd08437e04298b0189b442b65de

Here are the most recent failures in the latest build:

amcp commented 7 years ago

I added some more fixes that reduce test time. I will waive passes on MultiStoreTest, SingleGraphTest and MultiGraphTest as I know that these pass when run on a different host class.

amcp commented 7 years ago

closing and replacing with #96, #97, #98 and #99

F2006 commented 7 years ago

@amcp pull request made to the upstream travis branch with the heartbeat console output. The diffs now shows only all the changes to add the heartbeat, so it should be easier to read. :)