fergusstrange / embedded-postgres

Run a real Postgres database locally on Linux, OSX or Windows as part of another Go application or test
MIT License
819 stars 85 forks source link

unable to extract postgres archive: xz: data is truncated or corrupt #71

Closed jfigus closed 2 years ago

jfigus commented 2 years ago

When using embedded-postgres in a docker container we sometimes see the following error:

Failed to start embedded postgres: unable to extract postgres archive: xz: data is truncated or corrupt Perhaps there's a timing error in the code that fetches the jar file from maven and extracts the embedded .tgz file containing the postgres binaries.

fergusstrange commented 2 years ago

Thanks @jfigus. I suspect the fetch code could do with some added validation and error handling / messaging.

Do you have some details so that I can try to reproduce this? What docker container are you using? Are you running in CI or locally? Do you have a copy of the downloaded .tar.gz file that we could utilise for test purposes?

Thanks

jfigus commented 2 years ago

Thanks for taking a look. We're using Cenos7. We're using v11.15 of postgres, which means the tar file is taken from here: https://repo1.maven.org/maven2/io/zonky/test/postgres/embedded-postgr es-binaries-linux-amd64/11.15.0/embedded-postgres-binaries-linux-amd64-11.15.0.jar

I haven't been able to recreate the problem running docker locally. It's happening in our CI/CD environment, which is running a lot of embedded postgres for our UT. One thought is the maven repo is throttling requests coming from our CI/CD environment, as we do run a lot of builds there. It is an intermittent problem. So maybe we're getting throttled at times, which causes the download and decompress logic in embedded postgres to fail.

I'm able to work around the problem by adding the following Dockerfile steps to pre-load the tar file into the docker image:

RUN mkdir -p  /home/argento/.embedded-postgres-go
WORKDIR  /home/argento/.embedded-postgres-go
RUN curl -o embedded-postgres-binaries-linux-amd64-11.15.0.jar https://repo1.maven.org/maven2/io/zonky/test/postgres/embedded-postgr
es-binaries-linux-amd64/11.15.0/embedded-postgres-binaries-linux-amd64-11.15.0.jar
RUN unzip embedded-postgres-binaries-linux-amd64-11.15.0.jar
RUN mv postgres-linux-x86_64.txz embedded-postgres-binaries-linux-amd64-11.15.0.txz
fergusstrange commented 2 years ago

That's a great solution to the problem, awesome stuff!

This may be a bit of a slow burner as I try to reproduce. I suspect I'll need to write some tests that bung back a bad .jar or something as I'll likely never be able to actually reproduce.

Also happy to take PRs if you're confident you know how to solve the problem?

I'll keep you posted on progress.

fergusstrange commented 2 years ago

Hey @jfigus I've taken a look and it seems that if there was a throttling request it's likely this line would have failed https://github.com/fergusstrange/embedded-postgres/blob/7bf3aab2dca3964165ded64585e2516791c88876/remote_fetch.go#L42.

I'm now looking into adding some retries and better error messaging in order to help debug the issue.

fergusstrange commented 2 years ago

Hey @jfigus probably the first step to resolving this is verifying that the download was successful, after this we can then implement some retry logic potentially. I think it's probably safe to add this without configuration.

See what you think of this pull request.

Tah

fergusstrange commented 2 years ago

Nice one @jfigus, thanks for review.

Released https://github.com/fergusstrange/embedded-postgres/releases/tag/v1.17.0

I'll leave this open meanwhile to see if this is indeed where the error occurs, if you could test in your CI that would be great.

Next step some retry logic should we discover the root cause.

jfigus commented 2 years ago

Thank you for fixing this. We'll pull it into our CI/CD pipeline and see if the problem is resolved.

fergusstrange commented 2 years ago

Hey @jfigus any failures over the last 10 days?

jfigus commented 2 years ago

Looks good. I'll close this ticket.