cloudfoundry / postgres-release

BOSH release for PostgreSQL
Apache License 2.0
14 stars 36 forks source link

feat: Add the possibility to specify the used PostgreSQL version #75

Closed ZPascal closed 5 months ago

ZPascal commented 11 months ago

fix: #74

TODO:

I'm currently working with @Jobsby together on the topic.

cf-gitbot commented 11 months ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/186569819

The labels on this github issue will be updated when the story is started.

ZPascal commented 11 months ago

Hi @jpalermo & community, we have implemented most of the new features and now have a few issues with the old features and updating the minor versions in general. I'm just asking myself the following questions:

jpalermo commented 11 months ago
ZPascal commented 5 months ago

@jpalermo Could you please have a look?

ZPascal commented 5 months ago

@jpalermo Could you please have a look again? I've added your recommendations.

klakin-pivotal commented 4 months ago

Hey @ZPascal , was it your intention to exit 1 (which I think will terminate the entire script) from the function here:

https://github.com/cloudfoundry/postgres-release/pull/75/files#diff-c177d803deec7dfa3f331f9dac76ddc450207164b97343dd51bd51ab6b545755R71-R77

and here:

https://github.com/cloudfoundry/postgres-release/pull/75/files#diff-494db505a21234c6f1ad9c9fc80efe3f168bf9f38e7d01809be23e0a7682f6f2R8-R14

rather than return 1 (which I think will simply signal the failure of the function)?

NB: My Bash knowledge is not great, but this part of the PR looked incorrect to me. It may be that I am the one who is incorrect.

Context: I'm currently trying to figure out why the CI job that runs the Acceptance Tests for this Release has been failing ever since this Release was released.

I don't know that this is actually the problem, but I was looking around, and this didn't look right, so I thought I'd ask.

ZPascal commented 4 months ago

Hey @ZPascal , was it your intention to exit 1 (which I think will terminate the entire script) from the function here:

https://github.com/cloudfoundry/postgres-release/pull/75/files#diff-c177d803deec7dfa3f331f9dac76ddc450207164b97343dd51bd51ab6b545755R71-R77

and here:

https://github.com/cloudfoundry/postgres-release/pull/75/files#diff-494db505a21234c6f1ad9c9fc80efe3f168bf9f38e7d01809be23e0a7682f6f2R8-R14

rather than return 1 (which I think will simply signal the failure of the function)?

NB: My Bash knowledge is not great, but this part of the PR looked incorrect to me. It may be that I am the one who is incorrect.

Context: I'm currently trying to figure out why the CI job that runs the Acceptance Tests for this Release has been failing ever since this Release was released.

I don't know that this is actually the problem, but I was looking around, and this didn't look right, so I thought I'd ask.

Yes, it will terminate the script and I think that's the right thing to do. If don't stop the process, It will downgrade the database and start rewriting the configuration files. The tool used for minor and major version upgrades is pg_upgrade, which does not support version downgrades.

FYI: I discussed the issue with @jpalermo in a Slack discussion and already tested the effects. If you don't stop the downgrade, you end up with a crashed instance and have to do a lot of work to restore the instance.

Can you please share the CI job issue of the acceptance test on your end? I will take a look and maybe we can solve the problem together.

klakin-pivotal commented 4 months ago

Oh, you work for SAP, so you probably have access to the Cloudfoundry Concourse instance... sorry about that, I should have checked before writing my question.

The most recent failing build (which fails in the same way as all the others) is here: https://bosh.ci.cloudfoundry.org/teams/main/pipelines/postgres-release/jobs/run-acceptance-tests/builds/96

The failure is very strange. The SSH attempts for both the "Validating the postgres-previous is not created" and "Validating the postgres-previous is created" tests in the upgrade_single_node_test.go file fail. Despite what it looks like, both tests are failing, but the "... is not created" test doesn't check to see that the exit status from ssh is NOT 255 (which indicates some sort of ssh failure, rather than failure of the command ssh ran), it just tests if the exit status was non-zero.

The build immediately before it succeeded. (https://bosh.ci.cloudfoundry.org/teams/main/pipelines/postgres-release/jobs/run-acceptance-tests/builds/95) For that build, I pinned back the postgres-release resource to 5e330a77ae691e37da4585c5e404d7a8acede23b, which is the commit just before Final Release 52 was released.

The release of Final Release 52 might be unrelated to the acceptance test failure... however, I do notice that part of the test involves an upgrade from the ~previous~ [most recent] Final Release to the code in the repo... which means that the commit that releases Final Release 52 was (at least for part of the test suite) the first time upgrading from the code in Release 52 was tested.

ZPascal commented 4 months ago

Oh, you work for SAP, so you probably have access to the Cloudfoundry Concourse instance... sorry about that, I should have checked before writing my question.

The most recent failing build (which fails in the same way as all the others) is here: https://bosh.ci.cloudfoundry.org/teams/main/pipelines/postgres-release/jobs/run-acceptance-tests/builds/96

@klakin-pivotal No, problem. I have access to the corresponding concourse instance, but not to the specific test pipeline itself. Do you know which profile or authorization is required?

The failure is very strange. The SSH attempts for both the "Validating the postgres-previous is not created" and "Validating the postgres-previous is created" tests in the upgrade_single_node_test.go file fail. Despite what it looks like, both tests are failing, but the "... is not created" test doesn't check to see that the exit status from ssh is NOT 255 (which indicates some sort of ssh failure, rather than failure of the command ssh ran), it just tests if the exit status was non-zero.

The build immediately before it succeeded. (https://bosh.ci.cloudfoundry.org/teams/main/pipelines/postgres-release/jobs/run-acceptance-tests/builds/95) For that build, I pinned back the postgres-release resource to 5e330a77ae691e37da4585c5e404d7a8acede23b, which is the commit just before Final Release 52 was released.

The release of Final Release 52 might be unrelated to the acceptance test failure... however, I do notice that part of the test involves an upgrade from the previous [most recent] Final Release to the code in the repo... which means that the commit that releases Final Release 52 was (at least for part of the test suite) the first time upgrading from the code in Release 52 was tested.

That sounds interesting and I would like to try it out. The structure of the test is particularly interesting for me.