Closed tschaffter closed 11 months ago
The issue appeared first when I updated Prettier. Since then it has affected all pushes to main
.
Failed tasks:
Failed tasks:
- openchallenges-challenge-service:build-image-base
- openchallenges-organization-service:build-image-base
- openchallenges-image-service:build-image-base
For example:
BUILD FAILED in 11s
5 actionable tasks: 2 executed, 3 up-to-date
Failed to execute command: ./gradlew bootBuildImage
Error: Command failed: ./gradlew bootBuildImage
at checkExecSyncError (node:child_process:885:11)
at execSync (node:child_process:957:15)
at runBuilderCommand (/workspaces/sage-monorepo/node_modules/@nxrocks/common/src/lib/core/jvm/utils.js:20:38)
at runBootPluginCommand (/workspaces/sage-monorepo/node_modules/@nxrocks/nx-spring-boot/src/utils/boot-utils.js:18:43)
at /workspaces/sage-monorepo/node_modules/@nxrocks/nx-spring-boot/src/executors/build-image/executor.js:10:54
at Generator.next (<anonymous>)
at /workspaces/sage-monorepo/node_modules/tslib/tslib.js:118:75
at new Promise (<anonymous>)
at Object.__awaiter (/workspaces/sage-monorepo/node_modules/tslib/tslib.js:114:16)
at buildImageExecutor (/workspaces/sage-monorepo/node_modules/@nxrocks/nx-spring-boot/src/executors/build-image/executor.js:8:20)
at /workspaces/sage-monorepo/node_modules/nx/src/command-line/run/run.js:81:23
at Generator.next (<anonymous>)
at fulfilled (/workspaces/sage-monorepo/node_modules/nx/node_modules/tslib/tslib.js:166:62)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
I rerun the latest commit to main
today and this time the image of the image service built fine.
- openchallenges-organization-service:publish-and-remove-image
- openchallenges-image-service:publish-and-remove-image
- openchallenges-organization-service:publish-image
- openchallenges-organization-service:build-image
- openchallenges-image-service:publish-image
- openchallenges-image-service:build-image
Failed tasks:
- openchallenges-organization-service:build-image-base
- openchallenges-image-service:build-image-base
One particularity of updating Prettier is that this triggered the tasks for all the projects in the monorepo. Once side effect that could cause the error above is that the storage space available to the CI workflow was not enough.
But why did that impacted only the images of the three microservices?
There is also this error related to the images:
Deleted: sha256:0a4e87eff9269728a61abf3225455f49ecd3ea06c22cf0574a839fab7af80e89
Untagged: ghcr.io/sage-bionetworks/openchallenges-app:local
Deleted: sha256:3960d934632e06eec7d88ef8f7143b144036432a05bd8b88d0cb7d700ced4b3a
Error response from daemon: No such image: 0a4e87eff926:latest
I can reproduce the issue locally:
vscode@52f527f259e0:/workspaces/sage-monorepo$ nx run-many --target=build-and-remove-image \
--projects=openchallenges-app,openchallenges-challenge-service,openchallenges-organization-service,openchallenges-image-service,openchallenges-api-gateway,schematic-api \
--parallel=1
✔ nx run openchallenges-api-description:build-individuals (1s)
✔ nx run openchallenges-api-description:build [local cache]
✔ nx run openchallenges-app-config-data:build [local cache]
✔ nx run openchallenges-app-config-data:install (2s)
✔ nx run openchallenges-api-client-angular:build:production [local cache]
✔ nx run shared-java-util:build [local cache]
✔ nx run shared-java-util:install (3s)
✔ nx run openchallenges-api-gateway:build-image-base (32s)
✔ nx run openchallenges-api-gateway:build-image (2s)
✔ nx run schematic-api:build-image (2m)
✔ nx run openchallenges-app:build:production [local cache]
✔ nx run openchallenges-challenge-service:build-image-base (28s)
✔ nx run openchallenges-organization-service:build-image-base (28s)
✔ nx run openchallenges-image-service:build-image-base (23s)
✔ nx run openchallenges-app:server:production [local cache]
✔ nx run openchallenges-challenge-service:build-image (2s)
✔ nx run openchallenges-organization-service:build-image (2s)
✔ nx run openchallenges-image-service:build-image (2s)
✔ nx run openchallenges-app:build-image (4s)
✔ nx run openchallenges-api-gateway:build-and-remove-image (369ms)
✔ nx run schematic-api:build-and-remove-image (336ms)
✔ nx run openchallenges-challenge-service:build-and-remove-image (322ms)
✖ nx run openchallenges-organization-service:build-and-remove-image
Untagged: ghcr.io/sage-bionetworks/openchallenges-organization-service:edge
Untagged: ghcr.io/sage-bionetworks/openchallenges-organization-service:local
Untagged: ghcr.io/sage-bionetworks/openchallenges-organization-service:sha-6b7f7ac
Deleted: sha256:eba522cc7f3a06480fa78ebd5e67e273eceeec3b38661d5705ca0c67cac17239
Error response from daemon: conflict: unable to delete 483f7a79fa23 (cannot be forced) - image is being used by running container 4a3dcfe95c83
Error response from daemon: No such image: eba522cc7f3a:latest
Error response from daemon: No such image: eba522cc7f3a:latest
Warning: run-commands command "docker rmi $(docker images --filter=reference=ghcr.io/sage-bionetworks/openchallenges-organization-service:* --quiet) --force" exited with non-zero status code
✖ nx run openchallenges-image-service:build-and-remove-image
Untagged: ghcr.io/sage-bionetworks/openchallenges-image-service:edge
Untagged: ghcr.io/sage-bionetworks/openchallenges-image-service:local
Untagged: ghcr.io/sage-bionetworks/openchallenges-image-service:sha-6b7f7ac
Deleted: sha256:c652c92493cc101c8c19b7ce9744c8bb2bb2c600f82b977f4775f9480ea4142b
Error response from daemon: No such image: c652c92493cc:latest
Error response from daemon: No such image: c652c92493cc:latest
Error response from daemon: conflict: unable to delete 84e1b573a61a (cannot be forced) - image is being used by running container 239dc1c2049b
Warning: run-commands command "docker rmi $(docker images --filter=reference=ghcr.io/sage-bionetworks/openchallenges-image-service:* --quiet) --force" exited with non-zero status code
✔ nx run openchallenges-app:build-and-remove-image (343ms)
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
> NX Ran target build-and-remove-image for 6 projects and 19 tasks they depend on (4m)
✔ 23/25 succeeded [6 read from cache]
✖ 2/25 targets failed, including the following:
- nx run openchallenges-organization-service:build-and-remove-image
- nx run openchallenges-image-service:build-and-remove-image
View structured, searchable error logs at https://cloud.nx.app/runs/Zz6VHlKRB9
EDIT: The error is different on my end: it was because I was running containers that use the images and so deleting the images failed.
As I expected, the issue is because there are only 4 GB left before building the images. I really need to come up with a robust solution to this problem tracked in an existing ticket.
Run df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 84G 80G 4.0G 96% /
tmpfs 3.4G 172K 3.4G 1% /dev/shm
tmpfs 1.4G 1.2M 1.4G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda15 105M 6.1M 99M 6% /boot/efi
/dev/sdb1 14G 4.1G 9.0G 31% /mnt
tmpfs 693M 12K 693M 1% /run/user/1001
We are using the standard GH runner but it looks like we have access to a bigger one.
It looks like the larger runner can be used even for job triggered by PR from fork hosted outside of the Sage GH org.
For reference, here is the initial storage space before the commit is checkout:
Here the runtime is for applying the tasks to ALL the projects in the monorepo, not just the affected tasks. In most cases, the number of projects affected by a PR will be smaller and so the CI workflow will complete faster.
ubuntu-latest
:
The workflow "completes" in 28 minutes. Two tasks failed but wouldn't otherwise take much extra time.
This is the only larger runner that Sage currently makes available: ubuntu-22.04-4core-16GBRAM-150GBSSD
The workflow completes in 16 minutes!
I contacted IT and we will review the amount billed by the larger runner at this end of this month.
Is there an existing issue for this?
What product(s) are you seeing the problem on?
Sage Monorepo
Current behavior
The
main
branch shows an error with the execution of the CI workflow. The error happens when building and publishing the Docker images. Rerunning the workflow did not solve the issue.Expected behavior
No response
Anything else?
No response
Commit ID
6b7f7ac4591435371d04ef309cb379f2bd3ce836
Are you developing inside the dev container?
Code of Conduct