Open zackbright opened 5 years ago
Can you please try running the same command with DEBUG=1 BLUEBIRD_DEBUG=1 balena build ...
and paste the output here?
Here you are.
Looks like it is definitely the case the NodeJS is not able to handle this large a file and thus needs to load things in as a stream. A quick google pulls up some ways others have done this, but I don't want to be telling you all how to build. Especially, I'm unsure how reading in as stream may effect the actual processing of the images.
sudo DEBUG=1 BLUEBIRD_DEBUG=1 balena build --deviceType rasperrypi3 --arch armhf --emulated .
[Debug] Parsing input...
[Debug] Loading project...
[Debug] Resolving project...
[Debug] Failed to resolve project:
[Debug] AggregateError of:
[Debug] Error: ENOENT: no such file or directory, open '/Users/zackbright/resin_dev/ros2_app/talker3/docker-compose.yml'
[Debug] Error: ENOENT: no such file or directory, open '/Users/zackbright/resin_dev/ros2_app/talker3/docker-compose.yaml'
[Info] Creating default composition with source: /Users/zackbright/resin_dev/ros2_app/talker3
[Debug] Creating project...
[Info] Building for armhf/rasperrypi3
[Build] Built 1 service in 0 seconds
[Build] main Preparing...
[Error] Build failed
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (4192567296) is greater than possible Buffer: 2147483647 bytes
at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:262:11)
From previous event:
at Object.readFile (eval at thenify.withCallback (/usr/local/lib/node_modules/balena-cli/node_modules/thenify/index.js:33:10), <anonymous>:10:8)
at /usr/local/lib/node_modules/balena-cli/build/utils/compose.js:139:52
From previous event:
at tarDirectory (/usr/local/lib/node_modules/balena-cli/build/utils/compose.js:136:28)
at /usr/local/lib/node_modules/balena-cli/build/utils/compose.js:192:12
From previous event:
at Object.exports.buildProject (/usr/local/lib/node_modules/balena-cli/build/utils/compose.js:191:6)
at /usr/local/lib/node_modules/balena-cli/build/actions/build.js:28:20
From previous event:
at buildProject (/usr/local/lib/node_modules/balena-cli/build/actions/build.js:22:88)
at /usr/local/lib/node_modules/balena-cli/build/actions/build.js:94:16
From previous event:
at /usr/local/lib/node_modules/balena-cli/build/actions/build.js:93:22
From previous event:
at Command.action (/usr/local/lib/node_modules/balena-cli/build/actions/build.js:90:8)
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/command.js:98:37
at Command.module.exports.Command.applyPermissions (/usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/command.js:43:14)
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/command.js:92:24
at Command.module.exports.Command._checkElevation (/usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/command.js:63:14)
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/command.js:76:22
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/signature.js:176:14
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/node_modules/async/lib/async.js:52:16
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/node_modules/async/lib/async.js:269:32
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/node_modules/async/lib/async.js:44:16
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/build/signature.js:170:16
at /usr/local/lib/node_modules/balena-cli/node_modules/capitano/node_modules/async/lib/async.js:181:20
at Immediate.iterate (/usr/local/lib/node_modules/balena-cli/node_modules/capitano/node_modules/async/lib/async.js:262:13)
at runCallback (timers.js:705:18)
at tryOnImmediate (timers.js:676:5)
at processImmediate (timers.js:658:5)
at process.topLevelDomainCallback (domain.js:121:23)
From reviewing the output above, it was clear that the image being pulled was not the docker stored image (i.e. the one living in the docker memory), but the .tar of the image that was saved locally. I moved the .tar so that the image was not found. Turns out that this seems to have fixed things.
Probably still worth looking into as I was not able to deploy without the tar image.
I will see if I can deploy straight from the in-memory image in docker (i.e. docker image ls
returned images) rather than from the saved tar archive.
Forcing the rebuild invoked docker to use the in-memory version of the image.
Got my happy unicorn to show up.
I'm adding the details to be sure if anyone else has this issue that they may resolve it.
Build locally and as long as you have docker still going, you can rebuild and since it is a docker image the build doesn't actually force a re-compiling or rebuilding of the code (may be obvious to most who are familiar with docker, but I hesitated and didn't understand this fully as a neophyte).
So by re-invoking the build as shown below, you can get the built image from local docker and deploy.
sudo DEBUG=1 BLUEBIRD_DEBUG=1 balena deploy talker3 --build --deviceType rasperrypi3 --arch armhf --emulated .
Password:
[Debug] Parsing input...
[Debug] Loading project...
[Debug] Resolving project...
[Debug] Failed to resolve project:
[Debug] AggregateError of:
[Debug] Error: ENOENT: no such file or directory, open '/Users/zackbright/resin_dev/ros2_app/talker3/docker-compose.yml'
[Debug] Error: ENOENT: no such file or directory, open '/Users/zackbright/resin_dev/ros2_app/talker3/docker-compose.yaml'
[Info] Creating default composition with source: /Users/zackbright/resin_dev/ros2_app/talker3
[Debug] Creating project...
[Info] Building for armv7hf/raspberrypi3
[Debug] Prepared tasks; building...
[Build] Built 1 service in 0 seconds
[Build] main Image size: 3.79 GB
[Info] Creating release...
[Debug] Tagging images...
[Debug] Authorizing push...
[Info] Pushing images to registry...
[Debug] Saving image registry2.balena-cloud.com/v2/9a83c2c628ef523b5fea9f45d98637df
[Debug] Untagging images...
[Info] Saving release...
[Success] Deploy succeeded!
[Success] Release: 3fb40a2603cb98c61a31d10743c67c57
\
\
\\
\\
>\/7
_.-(6' \
(=___._/` \
) \ |
/ / |
/ > /
j < _\
_.-' : ``.
\ r=._\ `.
<`\\_ \ .`-.
\ r-7 `-. ._ ' . `\
\`, `-.`7 7) )
\/ \| \' / `-._
|| .'
\\ (
>\ >
,.-' >.'
<.'_.''
<'
Hey @zackbright thanks for bringing this up! This has actually revealed quite a few architectural problems in the way that build sources are packaged and sent to the docker daemon (whether local or remote). Basically, even though we're using a streaming interface to the tar creation, the other end of the stream isn't read until the entire tar archive has been packed into the stream.
There's a few different places where this occurs, so I'm going to work on fixing these up, and it should have the advantage of making balena build
and balena push
(both cloud and local variants) much more stable.
I'll reference this issue in the PRs if you want to track the progress of this.
Closing as fixes have been merged. Please reopen if you still encounter problems.
I encountered the problem while building an image containing CUDA, TF, OpenCV with the Nvidia Jetson Nano.
It works on balena CLI V. 11.24.0 but does not work with v11.26.0, but i will try it with v11.28.11 on Linux :)
[Debug] Parsing input...
[Debug] Loading project...
[Debug] Resolving project...
[Info] Compose file detected
[Debug] Creating project...
[Info] Building for aarch64/jetson-nano
[Build] Building services...
[Build] jetson-nano-base Preparing...
[Info] Installing qemu for aarch64 emulation...
[Info] Emulation is enabled
[Build] Built 1 service in 0:01
[Error] Build failed
ERR_FS_FILE_TOO_LARGE: File size (2380747015) is greater than possible Buffer: 2147483647 bytes
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (2380747015) is greater than possible Buffer: 2147483647 bytes
at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:264:11)
The limit still seems to be present in v11.28.11
[Debug] Parsing input...
[Debug] Loading project...
[Debug] Resolving project...
[Debug] docker-compose.yml file found at "/builds/.../..."
[Debug] Creating project...
[Info] Building for aarch64/jetson-nano
[Build] Building services...
[Build] jetson-nano-base Preparing...
[Info] Installing qemu for aarch64 emulation...
[Info] Emulation is enabled
[Build] Built 1 service in 0:01
[Error] Build failed
ERR_FS_FILE_TOO_LARGE: File size (2380747015) is greater than possible Buffer: 2147483647 bytes
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (2380747015) is greater than possible Buffer: 2147483647 bytes
at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:264:11)
From previous event:
at readFile (eval at thenify.withCallback (/snapshot/versioned-source/node_modules/thenify/index.js:33:10), <anonymous>:10:8)
at /snapshot/versioned-source/build/utils/compose.js:156:49
From previous event:
at tarDirectory (/snapshot/versioned-source/build/utils/compose.js:153:28)
at /snapshot/versioned-source/build/utils/compose.js:216:12
From previous event:
at Object.exports.buildProject (/snapshot/versioned-source/build/utils/compose.js:215:6)
at /snapshot/versioned-source/build/actions/build.js:51:20
From previous event:
at buildProject (/snapshot/versioned-source/build/actions/build.js:45:60)
at /snapshot/versioned-source/build/actions/build.js:133:16
From previous event:
at /snapshot/versioned-source/build/actions/build.js:132:22
From previous event:
at Command.action (/snapshot/versioned-source/build/actions/build.js:129:8)
at /snapshot/versioned-source/node_modules/capitano/build/command.js:98:37
at Command.module.exports.Command.applyPermissions (/snapshot/versioned-source/node_modules/capitano/build/command.js:43:14)
at /snapshot/versioned-source/node_modules/capitano/build/command.js:92:24
at Command.module.exports.Command._checkElevation (/snapshot/versioned-source/node_modules/capitano/build/command.js:63:14)
at /snapshot/versioned-source/node_modules/capitano/build/command.js:76:22
at /snapshot/versioned-source/node_modules/capitano/build/signature.js:177:14
at _toString (/snapshot/versioned-source/node_modules/capitano/node_modules/async/lib/async.js:52:16)
at /snapshot/versioned-source/node_modules/capitano/node_modules/async/lib/async.js:269:32
at /snapshot/versioned-source/node_modules/capitano/node_modules/async/lib/async.js:44:16
at /snapshot/versioned-source/node_modules/capitano/build/signature.js:171:16
at _setImmediate (/snapshot/versioned-source/node_modules/capitano/node_modules/async/lib/async.js:181:20)
at Immediate.iterate (/snapshot/versioned-source/node_modules/capitano/node_modules/async/lib/async.js:262:13)
at runCallback (timers.js:705:18)
at tryOnImmediate (timers.js:676:5)
at processImmediate (timers.js:658:5)
at process.topLevelDomainCallback (domain.js:126:23)
If you need help, don't hesitate in contacting our support forums at
https://forums.balena.io
For CLI bug reports or feature requests, have a look at the GitHub issues or
create a new one at: https://github.com/balena-io/balena-cli/issues/
@Langhalsdino, thank you for reporting it and including the stack trace. I am re-opening this issue.
In pre-transpiled code, I think the error comes from this line: https://github.com/balena-io/balena-cli/blob/v11.28.11/lib/utils/compose.coffee#L137
Probable bug explanation: the CLI is loading the whole file in a RAM-memory buffer before creating the tar stream that it sends to the Docker daemon over HTTP, and the buffer is not large enough. The fix will probably be to load the file in smaller chunks, or use Node's streams to do it for us.
I gather that it was already like that in CLI v11.24.0 though, so perhaps your file was a bit smaller when you were using that version.
I gather that it was already like that in CLI v11.24.0 though, so perhaps your file was a bit smaller when you were using that version.
I figured out why my build works for CLI v11.24.0 and not v11.26.0 or v11.28.11.
I added the --cache-from option to the build and therefore docker tries to push the large layers to the deamon. Since 11.24 did not have the --cache-from option the build succeeded, since it could not push the prebuild layers to the deamon :)
argv=[/usr/local/bin/balena,/snapshot/versioned-source/bin/balena,build,--application,jetson-app-name,--logs,--source,.,--emulated,--build-arg,BUILDKIT_INLINE_CACHE=1,--cache-from,registry.my.com/jetson:latest]
Mhm thats strange. I tried to run the same pipeline on different server and one succeeded on a big server 24-threads, 64GB RAM and it fails on a smaller one 12-threas, 14GB RAM.
But 2380747015 byte should be 2.3 GB right? Therefore this should not be a problem for neither of them. I will do it once more and investigate the system resources. Since I dockerized the build process there should not be any difference between the systems, except their specs.
There seems to be no real difference in resource usage, therefore i have no clue why both behave differently. Maybe the cache different images, run different dind versions, do caching differently. I dont know :/
For now i will stick with the server that works, eventhough i do not know the reason.
[jtonello] This issue has attached support thread https://jel.ly.fish/#/6ed1f99a-a3cf-4447-bc84-d649a73aa2b3
I just encountered the issue again and this time needed to solve it.
Here are the steps that helped me to resolve the issue. Eventhough I am not sure if all of them are needed to resolve the issue.
docker-compose.yml
fileDockerfile
that is just sourcing from the upstream docker image e.g. FROM balenalib/jetson-nano-ubuntu-python:3.6-bionic
balena build --application appName --logs --source . --emulated --build-arg BUILDKIT_INLINE_CACHE=1
Dockerfile
with the correct onebalena build --application appName --logs --source . --emulated --build-arg BUILDKIT_INLINE_CACHE=1
Strange observations while figuring out to solve the symptoms:
Just replacing the Dockerfile in our source code repository with an almost empty one did not change anything. The error ERR_FS_FILE_TOO_LARGE: File size (2380749827) is greater than possible Buffer: 2147483647 bytes
still got raised, eventhough every COPY
has been removed from the Dockerfile
.
If there is more information you will need to debug this further (only if it is of interest to you). I am happy to assist you.
Its now working with the following tool versions:
Client: Docker Engine - Community
Version: 19.03.5
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.5
Experimental: false
containerd:
Version: 1.2.10
runc:
Version: 1.0.0-rc8+dev
docker-init:
Version: 0.18.0
I can pull up the CLI/Node/Npm versions as needed, but this issue is theoretically independent of such.
When manipulating large images, i.e. >buffer size for node some of the files are loaded as single files instead of a stream.
e.g. I run below on a Dockerfile that builds from another codebase and compiles for my device. For reference I'm building ROS2.
After about 16 hours of compiling... yup... the image generated is quite large. The docker image is ~4.2GB If I attempt to deploy, I get:
I managed to get everything on my device by pushing to my docker hub repo and pulling on the device.
However, I noticed my entrypoint needed a small tweak (rookie mistake: didn't chmod +x first) Luckily the copy is after compiling, so I can try to rebuild the image and leverage the previous layers where compilation happened:
So, somewhere in the read of the of local existing image,
fs.readFile
or something similar is used and is throwingERR_FS_FILE_TOO_LARGE
Not sure exactly where the call is being made without going through the entire codebase, figure I would post here to see if it can be patched or if this is an upstream error from docker/node that cannot be avoided.