QIICR / dcmqi

dcmqi (DICOM for Quantitative Imaging) is a free, open source C++ library for conversion between imaging research formats and the standard DICOM representation for image analysis results
https://qiicr.gitbook.io/dcmqi-guide/
BSD 3-Clause "New" or "Revised" License
228 stars 62 forks source link

Switch to using Github Actions to consolidate CI #474

Closed fedorov closed 10 months ago

fedorov commented 1 year ago

TravisCI is gone, maintaining CircleCI and Appveyor separately is inconvenient, and unifying all CI activities under GitHub Actions can simplify maintenance and usability.

I started working on this in my fork so I do not pollute the main repo: https://github.com/fedorov/dcmqi

Build now works on all 3 platforms. Need to confirm testing works and do packaging, and then can migrate here.

fedorov commented 11 months ago

Need to look into caching to speed up build - can we cache ITK/DCMTK source and build trees?

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows

michaelonken commented 10 months ago

Windows and Linux builds are working (again).

The Mac OS build is broken right now due to a workflow cancellation from GitHub side ("Error: The operation was canceled."). See this workflow instance which demonstrates the issue. The cancellation happens after ~30 minutes total workflow time, and within that around 14 minutes after ITK build starts.

The cancellation happens during ITK build - either at the beginning (according to the log) or more likely in the middle (since I think log is only updated every now and then or after each step).

As the reason is not clear from the log, googling doesn't reveal much so it is only possible to speculate on the reasons. I can think of

  1. Bug in GitHubs Mac OS runner
  2. Out of memory
  3. Out of disk space

Bullet 1 is out of our control. 2. and 3. is probably not the case either - on my Linux machine ITK building takes no more than a few GB (<5) . Regarding 3. the whole final (super) build tree is "only" 3.7 GB. So unless there are large temporary files I cannot see why we should be running out of disk space.

At the time of writing, this is the specification of the GitHub Mac OS X machine:

I am not sure though whether RAM and SSD space are what is still available the user (what I expect) or whether preinstalled/prerunning applications have to be subtracted from that.

It is also possible to pay for larger Mac runners (more RAM, CPU). Maybe it is worth trying this once.

Any other ideas are welcome..;)

michaelonken commented 10 months ago

Need to look into caching to speed up build - can we cache ITK/DCMTK source and build trees?

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows

Windows builds already uses pre-built binaries of ITK, DCMTK, dicom3tools and ZLIB (via download) and build therefore only takes around 5 minutes.

For Linux and Mac caching should be possible, though both platforms use a different build approach (Linux: currently builds docker image, MacOS: regular build).

fedorov commented 10 months ago

@michaelonken I think this might have been due to the exceeded job timeout that was set at 30 min. I fixed it in this commit https://github.com/fedorov/dcmqi/commit/fcd821956c34c1a9018f9e1c92f8d64175054305. The latest workflow run failed for a completely different reason in the step following ITK build, but also it was faster than before, so I am cannot confirm that it failed earlier due to the timeout issue.

fedorov commented 10 months ago

I did another fix, and now the macos build is succeeding! 🎆

https://github.com/fedorov/dcmqi/actions/runs/6014190669/job/16313338052

michaelonken commented 10 months ago

Awesome! :)

Didn't know that one can set a timeout (and that there is one)...good catch :-)

fedorov commented 10 months ago

I only thought about it because I believe this timeout hit me earlier in the process...

I wish they had a clear communication to inform users when the job is terminated due to exceeded timeout...

michaelonken commented 10 months ago

On fedorov/dcmqi Mac OS Build now caches ITK, DCMTK and ZLIB. From the related commit:

--------------------8<------------------------------8<------------------------ ITK, DCMTK and ZLIB source code will not change since they are checked out at fixed versions. Also their builds should be compatible unless the Mac OS X version changes. Therefore, this data is now cached using GitHub actions cache facility.

None of these three will change frequently, so it is accepted that if one of them changes, no cache is applied and all packages are rebuilt as part of a full superbuild (resulting in an up-to-date cache entry). In other words, ITK, DCMTK and ZLIB will be cached in a single cache entry. It would also be possible to have a separate cache entry for all of them but that makes it more complicate to set the required build option when configuring dcmqi and requires some extra conditions since these libraries partly depend on each other.

For DCMTK and ITK, source code (and not only build) is part of the cache since the dcmqi build will include headers from these packages from their respective source code folder, while taking the libraries from the build folder. ZLIB source code is not cached, since during regular superbuild ZLIB is installed into the dcmqi/zlib-install folder, that includes binary libs and include files.

If a cache entry is found when running the GitHub action, the respective directories will be restored in the suberbuild's build tree. CMake will then configure dcmqi using DCMTK_DIR, ITK_DIR and ZLIB_ROOT to re-use these existing installations. Re-using the original build dirs simplifies cache management, since the files can be cached and restored into the same directories.

--------------------8<------------------------------8<------------------------

This brings down cached-enabled Mac OS X build down to a few minutes.

For the time being, Windows and Linux will remain as-is:

For the current Windows build setup caching is not necessary since it already uses pre-compiled library builds that are downloaded during build (resulting in 6-8 minutes/build). On the mid-term we could change this to rely on GitHub cache instead (which also makes it easier to upgrade these libs later).

The Linux build is docker-based, and after a quick look I have an idea of how it works. However, further analysis and cache integration will have to wait.

fedorov commented 10 months ago

Per discussion, agreed to remove circleci and appveyor integration, in the hopes that github actions will fill their functionality without regression.

fedorov commented 10 months ago

We should also replace the existing CircleCI and Appveyor badges with the GitHub Actions workflow status badges, as described in https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/adding-a-workflow-status-badge.

It's a small and easy fix, but I think very helpful to allow users to easily assess the status of the repo.

michaelonken commented 10 months ago

We should also replace the existing CircleCI and Appveyor badges with the GitHub Actions workflow status badges, as described in https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/adding-a-workflow-status-badge.

Done with 73baaa0.

fedorov commented 10 months ago

As discussed elsewhere, it would be ideal to make sure that packaging and publishing of the package works at least for Linux before merging that branch into upstream.

@michaelonken I will also squash some of the commits. There was a lot of trial and error, and quite a few of the commit messages I left are meaningless, since I did not intend those to end up in the upstream. It will be embarrassing if they do propagate!

michaelonken commented 10 months ago

Sure, let's keep the history clean to read and squash as necessary!