apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.44k stars 3.69k forks source link

Some tests take too long time #4402

Closed jihoonson closed 7 years ago

jihoonson commented 7 years ago

Recently, our travis ci sometimes fails due to the limited job time. I found some unit tests take too long time. Here is the list.

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 148.191 sec - in io.druid.query.groupby.epinephelinae.BufferGrouperTest

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 233.411 sec - in io.druid.query.groupby.epinephelinae.ByteBufferMinMaxOffsetHeapTest

Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 315.178 sec - in io.druid.segment.data.CompressedVSizeIndexedV3WriterTest

Tests run: 66, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,095.118 sec - in io.druid.segment.data.CompressedVSizeIntsIndexedSupplierTest

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 59.984 sec - in io.druid.server.initialization.JettyQosTest

Tests run: 35, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 161.293 sec - in io.druid.query.aggregation.datasketches.theta.oldapi.OldApiSketchAggregationTest

Tests run: 50, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 196.692 sec - in io.druid.query.aggregation.datasketches.theta.SketchAggregationTest
jon-wei commented 7 years ago

Thanks for gathering this info, I'm looking into scaling down the unit tests in ByteBufferMinMaxOffsetHeapTest

jihoonson commented 7 years ago

@jon-wei thanks for looking into it!

leventov commented 7 years ago

@gianm seems like none of Travis build after #4394 succeed, I think there is some problem with this PR

gianm commented 7 years ago

@leventov which failures do you think might be related to that PR? Is https://travis-ci.org/druid-io/druid/jobs/242999630 (https://github.com/druid-io/druid/pull/4404) one of them?

leventov commented 7 years ago

@gianm it also fails

gianm commented 7 years ago

Based on @jihoonson's original comment, and also the failure at https://travis-ci.org/druid-io/druid/jobs/242999630, and also running them locally, seems like the problem is likely in CompressedVSizeIndexedV3WriterTest and/or CompressedVSizeIntsIndexedSupplierTest. Travis fails if there's no output for 10 minutes, and the latter test does take more than 10 minutes to run sometimes. I have no idea why or if it is really related to #4394 or not.

gianm commented 7 years ago

I ran CompressedVSizeIntsIndexedSupplierTest on the druid-0.10.0 tag and it took 15s. So something has changed since then.

gianm commented 7 years ago

I ran it again on 976492c18644614fa7d4cf0cd1ad508929579e6c (which doesn't include #4394) and it took at least a few minutes (I stopped it before it finished). So I don't think #4394 is related. But something since druid-0.10.0 had some kind of an effect.

gianm commented 7 years ago

@leventov I looked into it, and the culprit is #4252, specifically these two changes.

For some reason, the changed version is much slower. Reverting back to Arrays.asList speeds up the tests significantly.

gianm commented 7 years ago

@leventov https://github.com/druid-io/druid/pull/4412

gianm commented 7 years ago

On my machine JettyQosTest takes 6s. I wonder if the substantially longer time you saw is because it spawns a lot of threads. @jihoonson did those numbers come from travis or one of your own machines?

gianm commented 7 years ago

4414 is meant to help with CompressedVSizeIntsIndexedSupplierTest.

jihoonson commented 7 years ago

@gianm these numbers come from travis, especially originally linked in the description (here, https://travis-ci.org/druid-io/druid/jobs/242596693).

jihoonson commented 7 years ago

Here are more long tests.

Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 105.67 sec - in io.druid.client.cache.CacheDistributionTest

Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.833 sec - in io.druid.indexer.IndexGeneratorJobTest

Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.797 sec - in io.druid.sql.avatica.DruidAvaticaHandlerTest

Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 145.639 sec - in io.druid.indexing.kafka.KafkaIndexTaskTest

Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 281.405 sec - in io.druid.query.lookup.KafkaLookupExtractorFactoryTest

Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 158.498 sec - in io.druid.segment.data.CompressedVSizeIndexedV3WriterTest
jihoonson commented 7 years ago

The below classes take long time because there are a lot of tests to be run in each class. We are using the parallel=classes fork option for parallel tests, so splitting these classes into several small classes will help to reduce the testing time.

Tests run: 178, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 96.218 sec - in io.druid.extendedset.intset.ImmutableConciseSetTest

Tests run: 3200, Failures: 0, Errors: 0, Skipped: 50, Time elapsed: 77.992 sec - in io.druid.query.groupby.GroupByQueryRunnerTest

Tests run: 27840, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 66.359 sec - in io.druid.query.topn.TopNQueryRunnerTest

Tests run: 192, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.598 sec - in io.druid.segment.filter.FloatFilteringTest

Tests run: 192, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.628 sec - in io.druid.segment.filter.LongFilteringTest

Tests run: 2592, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 135.988 sec - in io.druid.segment.IndexMergerTest