Closed jfigus closed 2 years ago
Thanks @jfigus. I suspect the fetch code could do with some added validation and error handling / messaging.
Do you have some details so that I can try to reproduce this? What docker container are you using? Are you running in CI or locally? Do you have a copy of the downloaded .tar.gz
file that we could utilise for test purposes?
Thanks
Thanks for taking a look. We're using Cenos7. We're using v11.15 of postgres, which means the tar file is taken from here: https://repo1.maven.org/maven2/io/zonky/test/postgres/embedded-postgr es-binaries-linux-amd64/11.15.0/embedded-postgres-binaries-linux-amd64-11.15.0.jar
I haven't been able to recreate the problem running docker locally. It's happening in our CI/CD environment, which is running a lot of embedded postgres for our UT. One thought is the maven repo is throttling requests coming from our CI/CD environment, as we do run a lot of builds there. It is an intermittent problem. So maybe we're getting throttled at times, which causes the download and decompress logic in embedded postgres to fail.
I'm able to work around the problem by adding the following Dockerfile steps to pre-load the tar file into the docker image:
RUN mkdir -p /home/argento/.embedded-postgres-go
WORKDIR /home/argento/.embedded-postgres-go
RUN curl -o embedded-postgres-binaries-linux-amd64-11.15.0.jar https://repo1.maven.org/maven2/io/zonky/test/postgres/embedded-postgr
es-binaries-linux-amd64/11.15.0/embedded-postgres-binaries-linux-amd64-11.15.0.jar
RUN unzip embedded-postgres-binaries-linux-amd64-11.15.0.jar
RUN mv postgres-linux-x86_64.txz embedded-postgres-binaries-linux-amd64-11.15.0.txz
That's a great solution to the problem, awesome stuff!
This may be a bit of a slow burner as I try to reproduce. I suspect I'll need to write some tests that bung back a bad .jar
or something as I'll likely never be able to actually reproduce.
Also happy to take PRs if you're confident you know how to solve the problem?
I'll keep you posted on progress.
Hey @jfigus I've taken a look and it seems that if there was a throttling request it's likely this line would have failed https://github.com/fergusstrange/embedded-postgres/blob/7bf3aab2dca3964165ded64585e2516791c88876/remote_fetch.go#L42.
I'm now looking into adding some retries and better error messaging in order to help debug the issue.
Hey @jfigus probably the first step to resolving this is verifying that the download was successful, after this we can then implement some retry logic potentially. I think it's probably safe to add this without configuration.
See what you think of this pull request.
Tah
Nice one @jfigus, thanks for review.
Released https://github.com/fergusstrange/embedded-postgres/releases/tag/v1.17.0
I'll leave this open meanwhile to see if this is indeed where the error occurs, if you could test in your CI that would be great.
Next step some retry logic should we discover the root cause.
Thank you for fixing this. We'll pull it into our CI/CD pipeline and see if the problem is resolved.
Hey @jfigus any failures over the last 10 days?
Looks good. I'll close this ticket.
When using embedded-postgres in a docker container we sometimes see the following error:
Failed to start embedded postgres: unable to extract postgres archive: xz: data is truncated or corrupt
Perhaps there's a timing error in the code that fetches the jar file from maven and extracts the embedded .tgz file containing the postgres binaries.