chainguard-dev / melange

build APKs from source code
Apache License 2.0
388 stars 87 forks source link

melange index after a build is beginning to take a long time as it processes the whole apk directory #279

Closed rawlingsj closed 1 year ago

rawlingsj commented 1 year ago

I noticed that wolfi builds send a lot of time generating the APKINDEX after building each package. I've only taken a quick peak at the code and saw that after the build we generate the index for a directory https://github.com/chainguard-dev/melange/blob/99e4df8b64ae4be46e8ff94311536dc5015867f0/pkg/build/build.go#L1392

I was wondering if it would be possible to add a single apk to the APKINDEX and avoid processing all packages? Because in wolfi we fetch the entire contents of the bucket before a build it means over time this processing time will continue to grow.

I took a quick look and adding a single package file replaces the whole APKINDEX rather than updating it. Assuming I've got that right I wonder if this is by design or if there's an improvement we can make here?

rawlingsj commented 1 year ago

i.e. there's lots of log entries like

2023-01-30T23:48:57.4601935Z 2023/01/30 23:48:57 melange: processing package /home/runner/work/os/os/packages/x86_64/tree-doc-2.0.4-r0.apk
2023-01-30T23:48:57.4609865Z 2023/01/30 23:48:57 melange: processing package /home/runner/work/os/os/packages/x86_64/tree-doc-2.1.0-r0.apk
2023-01-30T23:48:57.4632594Z 2023/01/30 23:48:57 melange: processing package /home/runner/work/os/os/packages/x86_64/tree-doc-2.1.0-r1.apk
2023-01-30T23:48:57.4640299Z 2023/01/30 23:48:57 melange: processing package /home/runner/work/os/os/packages/x86_64/tree-doc-2.1.0-r2.apk
tuananh commented 1 year ago

recent pr by @dlorenc https://github.com/chainguard-dev/melange/pull/236

rawlingsj commented 1 year ago

ah lol - love it! closing :)

rawlingsj commented 1 year ago

I guess it’s a slightly different approach to what I was thinking, rather than parallel if we could update the single package in the index but yeah this should help. Will keep close and we can see how the parallel indexing helps.

tuananh commented 1 year ago

I guess it’s a slightly different approach to what I was thinking, rather than parallel if we could update the single package in the index but yeah this should help. Will keep close and we can see how the parallel indexing helps.

i see. Helm did similar thing with helm repo index command where the command accepts a --merge flag.

https://github.com/helm/helm/blob/a499b4b179307c267bdf3ec49b880e3dbd2a5591/cmd/helm/repo_index.go#L93

rawlingsj commented 1 year ago

Yeah - maybe I'll reopen and see what folks think, we can always close again.

tuananh commented 1 year ago

@rawlingsj

i tried changing the NewConcurrencyLimiterForIO of the limiter. the default 4 seems too low.

with the DefaultConcurrencyLimitIO

2023/01/31 18:57:29 melange: processing package packages/zstd-doc-1.5.2-r0.apk
2023/01/31 18:57:29 melange: generating index at test.APKINDEX.tar.gz

real    0m26.475s
user    0m48.538s
sys 0m7.644s

with NewConcurrencyLimiterForIO(20)

2023/01/31 18:56:40 melange: processing package packages/zstd-doc-1.5.2-r0.apk
2023/01/31 18:56:42 melange: generating index at test.APKINDEX.tar.gz

real    0m16.549s
user    1m1.721s
sys 0m9.308s

maybe we can squeeze a little bit more before a PR to merge index lands :D