elucideye / drishti

Real time eye tracking for embedded and mobile devices.
BSD 3-Clause "New" or "Revised" License
390 stars 82 forks source link

travis + android w/ emulator timeout #589

Closed headupinclouds closed 6 years ago

headupinclouds commented 6 years ago

The Travis Android toolchain (w/ emulator tests) frequently times out. This requires 3-4 restarts (or more) before a PR gets a green light. This is after: removing LTO processing to save link time; consolidating tests; and removing non-essential executable targets. I believe a fair amount of time is lost to the Android installation, but I haven't timed it closely.

This following article mentions the Travis cache feature: http://panavtec.me/continous-integration-on-android-with-travis-ci, which might be helpful.

Travis can cache directories that you need to speedup the following builds, to do that you have to pay or you can use the new container infrastructure, by specifing “sudo: false”. To enable cache, you have to specify which directories you want to cache, in this case I’m caching some .gradle folders with:

In practice, this is the biggest bottleneck in the development workflow.

ruslo commented 6 years ago

Old benchmark: https://github.com/elucideye/drishti/issues/13#issuecomment-243860301

headupinclouds commented 6 years ago

Here is a log of the most recent timeout (after test refactoring): log-android-timeout.txt

Some date commands could be added for timing the initial setup steps.

headupinclouds commented 6 years ago

I've added travis_wait 200 to the build+test script:

https://github.com/elucideye/drishti/blob/0bcb0b66a6b560735bf76ae787cf936ef682c4a4/.travis.yml#L79

ruslo commented 6 years ago

This following article mentions the Travis cache feature: http://panavtec.me/continous-integration-on-android-with-travis-ci, which might be helpful

No new ideas. Basically it's Travis cache feature that already was tested.

compare cache w/ current SDK + NDK installation

Old benchmark is here: https://github.com/elucideye/drishti/issues/13#issuecomment-243860301. I can run it again but I doubt it will have any effect.

From Travis docs:

If you store archives larger than a few hundred megabytes in the cache, it’s
unlikely that you’ll see a big speed improvement

For Android NDK/Android SDK we are talking about gigabytes.

Here is a log of the most recent timeout

Checking...

ruslo commented 6 years ago

Some date commands could be added for timing the initial setup steps

Polly prints times of all stages at the end. So actually successful build will be more informative.

ruslo commented 6 years ago

Configure stage takes about 16+ minutes. Most heavy parts was about huge Android-SDK package which consists of a lot of components.

Download/install Android SDK components: 25 + 57 + 19 + 18 ~ 2 minutes These components are not needed in fact because there are already integrated into Android SDK package on cache hit. They can be optimized out by introducing PRIVATE package in Hunter. PRIVATE will signal the whole system that we depend on package but we don't need it after install. Another example of such package is Sugar modules.

Android SDK download (3+ minutes):

-- [hunter *** DEBUG *** 2017-10-18T12:39:28] Downloading file (try #0 of 10):
-- [hunter *** DEBUG *** 2017-10-18T12:39:28]   https://github.com/elucideye/hunter-cache/releases/download/cache/add02227c4114c29ce34923329f05c1764ada74e.tar.bz2
-- [hunter *** DEBUG *** 2017-10-18T12:39:28]   -> /Users/travis/.hunter/_Base/Cache/raw/add02227c4114c29ce34923329f05c1764ada74e.tar.bz2
-- [download 0% complete]
...
-- [download 100% complete]
-- [hunter *** DEBUG *** 2017-10-18T12:42:53] Locking directory: /Users/travis/.hunter/_Base/Cellar/add02227c4114c29ce34923329f05c1764ada74e/add0222

Android SDK unpacking to cellar (4+ minutes):

-- [hunter *** DEBUG *** 2017-10-18T12:42:53] Unpacking to cellar:
-- [hunter *** DEBUG *** 2017-10-18T12:42:53]   /Users/travis/.hunter/_Base/Cache/raw/add02227c4114c29ce34923329f05c1764ada74e.tar.bz2
-- [hunter *** DEBUG *** 2017-10-18T12:42:53]   -> /Users/travis/.hunter/_Base/Cellar/add02227c4114c29ce34923329f05c1764ada74e/add0222/raw
-- [hunter *** DEBUG *** 2017-10-18T12:46:56] Unpacked successfully

Creating link script (CMake globbing) for Android SDK (2+ minutes):

-- [hunter *** DEBUG *** 2017-10-18T12:46:56] Creating list of files and directories
-- [hunter *** DEBUG *** 2017-10-18T12:49:11] Creating directories

This part can be optimized out if we create such script on packing stage instead of after unpack done.

Next interesting parts I want to check:

Will try to catch successful build to see the time for:

headupinclouds commented 6 years ago

Ok, thanks for the breakdown. A few of the exhaustive tests (repeating regression for a list of resolutions) can be scaled back for HW limited CI environments.

headupinclouds commented 6 years ago

Parallel testing (non-GPU) might save a little time. I'm sure this is at the bottom of the list, but it can't hurt to add the notes for the record:

ruslo commented 6 years ago

Parallel testing (non-GPU) might save a little time

It will be trickier for Android since we need to do it in Gauze and device/emulator.

headupinclouds commented 6 years ago

Will try to catch successful build to see the time for

This android build (prior to test refactoring) finished in 36 minutes: https://travis-ci.org/elucideye/drishti/jobs/288125732

headupinclouds commented 6 years ago

Related: https://github.com/ldionne/metabench Fine grain compile time benchmarks in CMake.

ruslo commented 6 years ago

This android build (prior to test refactoring) finished in 36 minutes

headupinclouds commented 6 years ago

These two loops can be reduced or removed as optimizations for slow emulator tests (the fitting usually takes a few milliseconds per iteration on normal HW): https://github.com/elucideye/drishti/blob/b81bcd80eea997ad4cf47631208c0b90a709952c/src/lib/drishti/eye/ut/test-drishti-eye.cpp#L251 https://github.com/elucideye/drishti/blob/b81bcd80eea997ad4cf47631208c0b90a709952c/src/lib/drishti/drishti/ut/test-EyeSegmenter.cpp#L332

headupinclouds commented 6 years ago

The circleci.com #295 issue may provide another option for CI optimization. As suggested by @ruslo, if circleci.com open source timeouts are generous enough for end-to-end drishti builds, we can ditch drishti-upload and rely on the hunter build time cache updates. That would be a huge win in overall project design/simplicity/maintenance.

@ruslo : Is there an open issue for the hunter upload feature (design notes, etc)?

ruslo commented 6 years ago

Is there an open issue for the hunter upload feature (design notes, etc)?

https://github.com/ruslo/hunter/issues/1097

headupinclouds commented 6 years ago

The OS X jobs in PR #594 have not started yet after: 5 hrs 56 min 20. One can't complain about a free service...

headupinclouds commented 6 years ago

I meant: One can't complain about a free service, but that isn't really workable.

ruslo commented 6 years ago

Last pull request: https://github.com/elucideye/drishti/pull/595 Android build: https://travis-ci.org/elucideye/drishti/builds/290753596

Detailed: drishti-android-build

Note that it looks like this build was using more powerfull hardware. Comparing with old times:

headupinclouds commented 6 years ago

Continuing the CI thread...

Interesting points from the article: https://hackernoon.com/continuous-integration-circleci-vs-travis-ci-vs-jenkins-41a1c2bd95f5

headupinclouds commented 6 years ago

Detailed:

Very informative. How did you generate this, btw?

The HW emulator adds a lot of overhead: drishti_tests

The full set of tests run in Total Test time (real) = 11.43 sec on my laptop. I'll add a PR to scale these back when GAUZE_ANDROID_USE_EMULATOR=YES.

headupinclouds commented 6 years ago

This one ran in 30 minutes:

screen shot 2017-10-21 at 5 49 16 pm
ruslo commented 6 years ago

Very informative. How did you generate this, btw?

It's not generated, I've made it with http://www.yworks.com/products/yed

ruslo commented 6 years ago

I guess this means OS X and Linux images can't be supported as part of the same config?

You mean Toolchain-ID that will conflict on upload?

ruslo commented 6 years ago

I guess this means OS X and Linux images can't be supported as part of the same config?

You mean Toolchain-ID that will conflict on upload?

Okay, reading an article. I think it means that on Circle CI you can write YAML config only for Linux or only for OSX, not both simultaneously. E.g. you can't have:

- os: linux
  env: PROJECT_DIR=examples/drishti TOOLCHAIN=gcc

- os: osx
  env: PROJECT_DIR=examples/drishti TOOLCHAIN=osx-10-11

like Travis.

headupinclouds commented 6 years ago

Okay, reading an article. I think it means that on Circle CI you can write YAML config only for Linux or only for OSX, not both simultaneously. E.g. you can't have:

That was my understanding as well. I don't know if there are any workarounds to achieve the same, such as running two separate yaml files simultaneously: circleci.osx.yml and circleci.linux.yml. Actually, the Linux image time are fairly consistent on Travis (I'm assuming this is due to Apple HW cost). I suppose one option would be to configure builds for:

The build times have been pretty good on Travis for the last couple of days, but I think that might be due to off peak (weekend) usage.

headupinclouds commented 6 years ago

Some day: https://github.com/ldionne/metabench + http://www.yworks.com/products/yed 😄

headupinclouds commented 6 years ago

times

ruslo commented 6 years ago

Goals:

headupinclouds commented 6 years ago

No longer an issue.