Closed awanlin closed 12 months ago
Hey @awanlin hmm looks like we're gonna need to find some more info here. I doubt that it would be something special that we've done, but happy to be disproved there! I wonder if this is an autobump of a dependency, either in the main repo which has propogated to others, or it's been bumped by other forms of dependency management.
The reason I say that we might not have had control over this is because I don't think we explicitly use any zlib
stuff directly, only through dependencies.
Would be good to get some more information as to what these URL's are and see if we can create a test case that replicates it more often than not.
Just looking again at the error message it looks like the offending libraries could be either tar
or minizllib
. Can you check the git history in the yarn.lock for these packages and see if any of them have changed over the period of the releases you mentioned?
Yeah @benjdlambert, that's where my thinking has gone around this issue as the stack itself is deep into other libraries and not any code that I recognize as coming from Backstage itself.
I'll do some digging into the yarn.lock
as suggested and get back to you
Just a bit of data so far:
Package versions by release:
Package | 1.14.0-next.0 | 1.14.0-next.1 | 1.14.0-next.2 | 1.14.0 |
---|---|---|---|---|
tar | 6.1.13 | 6.1.14 | 6.1.15 | 6.1.15 |
minizlib | 2.1.2 | 2.1.2 | 2.1.2 | 2.1.2 |
Release dates:
Release | Date |
---|---|
1.14.0-next.0 | April 25, 2023 |
1.14.0-next.1 | May 2, 2023 |
1.14.0-next.2 | May 10, 2023 |
1.14.0 | May 16, 2023 |
So no changes to minizlib
over the period in time where we think the issue started but tar
does have changes.
Here is the comparison from 6.1.13
to 6.1.14
- notice there are changes around minipass
: https://github.com/isaacs/node-tar/compare/v6.1.13...v6.1.14
And here is the comparison from 6.1.14
to 6.1.15
- nothing jumps our at me: https://github.com/isaacs/node-tar/compare/v6.1.14...v6.1.15
I'm going to try something here, bit of YOLO, and pinning tar
to 6.1.13
which also brings minipass
down to 4.2.8
from 5.0.0
. Will report back on the results
Sadly this did not work, got the same result - zlib: unexpected end of file
hmm :( that's unfortunate. The other thing I could think it could be would be some change in backend-common, so maybe you could try pinning the version of backend-common to something before the 1.14.0-next.1 release? Hopefully the types are compatible 🤞
Or rather some change in backend-common wether that be deps bumps or code itself that's caused an issue.
The template repo that I'm having the issue with is a base terraform repo with 9 files. It happens intermittently. Today I tried making a test scaffolder template with a duplicate test template repo. I found that when I limited it to 3 files (2 tfvars that are being changed, and an .md) it worked 100% of the time. Adding any files back caused it to fail again. It seems to be affected by load.
I however was unable to reproduce when running a local instance using the same templates and repos. Was hoping to be able to debug. It is much more performant locally, so if load related, that would fit.
I am facing the same problem when running locally. I cannot find what causes it and can't always reproduce it. To solve it I just start the template over once or twice until the error is gone.
I have the same error
"me too". The funny thing is that locally (in dev mode) it works fine (from time to time same template fails with rate of maybe 5%), in the container - fine, deploying container into the K8 and the failure rate is 100% (I am still trying to figure out what could be the reason for this difference, but looking to the initial issue message "We can't reproduce it locally but it happens in all of our deployed environments." it seems to fit)
Can someone run the docker container locally and see if it still breaks? Is it the version of node inside the container, or some dependency? Or is it something to do with networking on the cluster side?
Any steps to reproduce so we can get this happening locally would be ideal for us to dig in a little more.
I wrote I tested it already (this surely may not be much representative):
To mention: I also tried to download the same archive what UrlReader is fetching inside of the container to exclude networking issues - no problems and tar is absolutely happy with it.
Additionally inside the container I see part of the content was unpacked (in the /tmp there is a folder ***_step-fetch-base-***
with in my case 17 out of 27 files present). It is not dropped afterwards and stays till pod restart. I also tried mounting /tmp explicitly as a volume with no change.
ok, I actually got the failure with docker (podman) locally with a fresh create-app stuff:
https://github.com/gtema/backstage-zip-test
Built image according to https://backstage.io/docs/deployment/docker (I modified the dockerfile there to take use 18)
yarn install --frozen-lockfile
yarn tsc
yarn build:backend --config app-config.yaml
podman build . -t backstage
started container with:
podman run -it -p 7007:7007 backstage
In browser (localhost:7007) Went to Create > Demo > run and got the error during single step of the template (https://github.com/gtema/backstage-template/blob/main/demo/template.yaml) which fetches content from subfolder
update: tried all possible node 18 base images - all fail, but once I switched back to node:16-bullseye-slim the error gone.
Still very interesting that version of node (18) in container and locally is same, so apparently there is something with base image of node18, and not the node18 itself
This is great work @gtema, thanks for taking the time to look into this more. We have been using node:18-bullseye-slim
since February and the soonest this error started for us is late April. Wondering if there might be changes during that period we should look at?
At the moment I have no further ideas where to look at
but a very brief look on image layers shows that 18-slim has debian/tar 1.34+dfsg-1, while 16-slim: debian/tar 1.30+dfsg-6 Maybe error is really coming already from there
Very interesting, I will look more into this today.
I just saw this issue occur in a new place - during a techdocs docs build within the app.
info: Step 1 of 3: Preparing docs for entity component:default/my-thing {"timestamp":"2023-06-09T14:55:31.780Z"}
debug: Reading files from https://github.com/<ORG>/<REPO>/tree/main/ {"timestamp":"2023-06-09T14:55:31.814Z"}
error: Failed to build the docs page: zlib: unexpected end of file {"timestamp":"2023-06-09T14:55:34.732Z"}
On a deeper dive, I see that repo has no mkdocs.yml or docs directory. So this may be unrelated.
it is actually quite logical that it is related, since techdocs would also try to fetch whole repo as archive and get required part from it (as far as my experiments go)
I've validated that changing my deployment's Node from 18.12.0 to 16.20.0 (meaning changing the base image from node:lts-bullseye-slim to node:16-bullseye-slim) does fix the issue.
OK so I just did some digging around and it looks like that this issue has been brought to the node team, but it's actually zlib that's causing this issue as the stream is malformed in the first place, and it's a little more strict now about how it's handled.
https://github.com/nodejs/node/issues/46359 https://github.com/madler/zlib/issues/773
I wonder if this is something that we need to bring up with github instead as it looks like their zlib compression is invalid?
I would assume that the error still persists in v20 of node too right if someone can confirm?
We just had a user report the above error to our Backstage team. Looking back through our logs, it appears as though the error first showed up as far back as April 21. We are still investigating on our end to see what changes were deployed in or around this time.
Had this same error for some, but not all techDoc entries. Tried @skyem solution and it worked for a bit, but stopped worked quickly for no apparent reason. Just updated from Backstage version 1.14.2 to 1.15.0 and the issue is fixed for me. (using Node v18.16.0) Keep up the good work!
Edit: Cheered too soon. A doc that had this error now works again, but another one broke due to this issue.
Hi, I got same issue.
Backstage version is 1.15.0 (update today) . It was occured on Nodejs 18.16.0/18.16.1, but works well on Nodejs 18.13.0.
I had a similar issue building local techdocs with this on nodejs v18.16.0 and v18.16.1, reverted back to v16 (v16.20.1) and appears to have resolved the issues. I would prefer not to be on older versions of node, however, can't see an easy way around it currently.
I was also forced to go back to node 16. Latest backstage version with node18-bullseye-slim is not working, while with node16-bullseye-slim it works
This should be fixed in the latest -next
release if you want to try this? @gtema @willhallonline
For anyone else running across this issue. Mine was caused by excessive logging during the build process. I had left debug logs that displayed the entire prop object. Removing the excess logging resolved the issue.
For anyone still getting this, bumping to node 18.18.1 fixed this for me
we updated backstage to latest release version (as of this) and the error is gone. didn't pursue it further.
our instance was quite old (last update was April).
I got the same issue in my backstage
too, but to solve this problem I created an empty repository with only a README.md
file on it, moved all files from the selected template repo to the new one, and this issue never showed again.
In my original template repo, I removed the original .git
folder and started all over again with git init
then included all files and push code with --force
which blew the Scaffolder
plugin to fetch the repository.
📜 Description
We've started to see "zlib: unexpected end of file" errors in our logs. At first it was just in a single place but now we are seeing it in another. These both come places where we are using the
UrlReader
one with Bitbucket and the other with GitHub. We can't reproduce it locally but it happens in all of our deployed environments.The error stack:
Others have reported this on Discord as well: https://discord.com/channels/687207715902193673/1108389526440771624
👍 Expected behavior
There should be no errors and the
UrlReader
should work as expected👎 Actual Behavior with Screenshots
We are getting this error:
👟 Reproduction steps
Honestly this isn't one that we are able to reproduce. We've just seen it in our logs and working back from there they all are from using the
UrlReader
.📃 Provide the context for the Bug.
This error is causing issues with a few internal plugins that required pulling in files from various SCM using the
UrlReader
which was working fine before.🖥️ Your Environment
I can't get all the details but this started for use somewhere between
1.14.0-next.2
and1.14.0
👀 Have you spent some time to check if this bug has been raised before?
🏢 Have you read the Code of Conduct?
Are you willing to submit PR?
No, but I'm happy to collaborate on a PR with someone else