bcgov / DITP-DevOps

Digital Identity and Trust Program Team's DevOps Documentation Repository
Apache License 2.0
2 stars 5 forks source link

Update DTS Endorser Deployments #138

Closed esune closed 5 months ago

esune commented 9 months ago

The https://github.com/hyperledger/aries-endorser-service was recently updated to support configurable rule sets for automatic endorsement of transactions. We need to update the deployments to use the new service code.

Deployment configurations for our endorser services are here: https://github.com/bcgov/dts-endorser-service

The upgrade likely just requires a rebuild and update of the tagged image references for the aries-endorser-agent and aries-endorser-api services.

WadeBarnes commented 8 months ago

This was, and should have continued to have been, an automated process. The last time this triggered automatically it built and deployed @Gavinok's changes.

Granular configuration of auto-endorsement  (#34)
SHA-1: 8eda41c31803d16351a81144656a8f1d50335668

Then got stuck on the two commits after that.

Looking into why it got stuck and unsticking it.

Summary All but the most recent Enhancement: Introduce Support for Uploading CSV-Based Configuration (#37) changes were built and deployed to dev automatically.

WadeBarnes commented 8 months ago

The Granular configuration of auto-endorsement (#34) changes were deployed and available since 2023-10-13, the day they were merged.

WadeBarnes commented 8 months ago

I've fixed the pipelines. I had to do some cleanup of some resources that were erroneously deployed to the tools environment and some adjustments to the jenkins config.

WadeBarnes commented 8 months ago

The endorser-api deployment with the latest code is failing with the flowing error:

INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
ERROR [alembic.util.messaging] Can't locate revision identified by 'd925cb39480e'
FAILED: Can't locate revision identified by 'd925cb39480e'
Alembic db upgrade failed...

cc @Gavinok

Gavinok commented 8 months ago

I'll have a look at it alembic was being a massive pain before I'll see if I can reproduce this

WadeBarnes commented 8 months ago

In case it helps, this is what I see in the database, in dev and test:

endorser_controller_db=# select * from alembic_version;
 version_num  
--------------
 d925cb39480e
(1 row)
Gavinok commented 8 months ago

I figure this has something to do with how the instance is being updated since I can't reproduce it locally on a fresh startup.

Gavinok commented 8 months ago

If I understand the source of the problem it has to do with me removing the old migrations version. Something that caused a similar issue locally and required me to regenerate it from scratch. I figure since this is upgrading the currently deployed environment the DB need's that old migration to function. This may take a little longer for me to pin down how to this all work together.

WadeBarnes commented 8 months ago

If I understand the source of the problem it has to do with me removing the old migrations version. Something that caused a similar issue locally and required me to regenerate it from scratch.

That would be a problem for any existing deployments.

I figure since this is upgrading the currently deployed environment the DB need's that old migration to function. Yes, since that migration is likely what created the initial schema in the first place.

It would be best if you could restore the original migrations, and then figure out how to add yours from there. Otherwise we'll have to deal with the, perhaps, complex task of migrating the data to the new migration/schema. Something I'd like to avoid completely.

esune commented 8 months ago

I should have caught that the migrations were removed in the pull request I reviewed (https://github.com/hyperledger/aries-endorser-service/pull/37) - sorry about that. Restoring migrations and adding the new ones AFTER those is definitely the way to go, as @WadeBarnes suggested.

Gavinok commented 8 months ago

I have restored the file and generated a new migration in https://github.com/hyperledger/aries-endorser-service/pull/42

WadeBarnes commented 8 months ago

I built and deployed the new code to smoke test. Running into the following error(s) now:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade d925cb39480e -> e6afa1dce289, updated to support Granular configuration of auto-endorsement
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 719, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.NotNullViolation: column "author_goal_code" of relation "endorserequest" contains null values

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/alembic/__main__.py", line 4, in <module>
    main(prog="alembic")
  File "/usr/local/lib/python3.10/site-packages/alembic/config.py", line 588, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/usr/local/lib/python3.10/site-packages/alembic/config.py", line 582, in main
    self.run_cmd(cfg, options)
  File "/usr/local/lib/python3.10/site-packages/alembic/config.py", line 559, in run_cmd
    fn(
  File "/usr/local/lib/python3.10/site-packages/alembic/command.py", line 320, in upgrade
    script.run_env()
  File "/usr/local/lib/python3.10/site-packages/alembic/script/base.py", line 563, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/usr/local/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/local/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/api/db/migrations/env.py", line 69, in <module>
    run_migrations_online()
  File "/app/api/db/migrations/env.py", line 63, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/usr/local/lib/python3.10/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/usr/local/lib/python3.10/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
    step.migration_fn(**kw)
  File "/app/api/db/migrations/versions/updated_to_support_granular__e6afa1dce289.py", line 83, in upgrade
    op.add_column(
  File "<string>", line 8, in add_column
  File "<string>", line 3, in add_column
  File "/usr/local/lib/python3.10/site-packages/alembic/operations/ops.py", line 2047, in add_column
    return operations.invoke(op)
  File "/usr/local/lib/python3.10/site-packages/alembic/operations/base.py", line 392, in invoke
    return fn(self, operation)
  File "/usr/local/lib/python3.10/site-packages/alembic/operations/toimpl.py", line 154, in add_column
    operations.impl.add_column(table_name, column, schema=schema, **kw)
  File "/usr/local/lib/python3.10/site-packages/alembic/ddl/impl.py", line 324, in add_column
    self._exec(base.AddColumn(table_name, column, schema=schema))
  File "/usr/local/lib/python3.10/site-packages/alembic/ddl/impl.py", line 197, in _exec
    return conn.execute(construct, multiparams)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 77, in _execute_on_connection
    return connection._execute_ddl(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1381, in _execute_ddl
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 719, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) column "author_goal_code" of relation "endorserequest" contains null values

[SQL: ALTER TABLE endorserequest ADD COLUMN author_goal_code VARCHAR NOT NULL]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Alembic db upgrade failed...
WadeBarnes commented 8 months ago

It looks like the error is due to the fact that the table already contains records and adding the new author_goal_code column with the NOT NULL constraint and no default value causes author_goal_code fields to be created with NULL values.

So you're going to need to update the statement to something like this; ALTER TABLE endorserequest ADD COLUMN author_goal_code VARCHAR NOT NULL DEFAULT '<the default author goal code>'

Gavinok commented 8 months ago

You are correct. I have just updated the PR to resolve this

WadeBarnes commented 8 months ago

I've built and deployed the new code to the dev environment, The migration ran successfully and the API container is running.

INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade d925cb39480e -> fb66f2d55aee, updated to support granular configuration
2024-01-03 13:36:14,080 - api.main - WARNING - >>> Starting up app ...
2024-01-03 13:36:14,186 - api.main - WARNING - >>> Starting up app ...

@Gavinok, Please have a look to ensure everything is working as expected.

WadeBarnes commented 7 months ago

@Gavinok, Is this ready to go to test?

@esune, These updates have only been applied to the BCovrin endorser instance. We should meet to review and discuss the impact of applying these changes to the CANdy and Sovrin endorser instances. I believe @loneil has encountered some unexpected behavior with the connection between Traction tenants and the BCovrin dev endorser instance.

loneil commented 7 months ago

@WadeBarnes the issue isn't related to Traction tenant connections (as far as I can tell), it's that the Endorser Service can't fetch Transactions with the newer code changes I think, and possibly (though unconfirmed) isn't ingesting the pending Transactions from the agent. Discussed with @esune on friday and I'm looking into the issues with the Service now locally.

So I don't think we'd want to update any other instances yet.

WadeBarnes commented 7 months ago

Thanks for the information @loneil, we'll hold off updating any other environments until this gets straightened out.

esune commented 7 months ago

Pushed an update to the PR and updated the bcovrin-dev endorser with the new image, everything seems to be working correctly now.

WadeBarnes commented 7 months ago

The CANdy and BCovrin endorser instances in dev have been updated with the latest changes. @esune, @loneil, Could you please review and ensure things are working as expected? Once you confirm we can promote the changes to the test environment which will also affect the Sovrin TestNet endorser. We can track the promotions with a separate ticket.

WadeBarnes commented 7 months ago

Please reassign to me once this is ready to move forward.

loneil commented 7 months ago

The Candy and BCovrin dev services appear to be all good related to the changes (as of the Jan 10 update). Not sure other considerations for deploying to TEST env would have to be made before going forward.

WadeBarnes commented 7 months ago

Updating our Sovrin Endorsers is blocked by issues we're having with the REV_REG entries not being forwarded to the endorser for signing; https://github.com/hyperledger/aries-cloudagent-python/issues/2441

esune commented 7 months ago

Updating our Sovrin Endorsers is blocked by issues we're having with the REV_REG entries not being forwarded to the endorser for signing; hyperledger/aries-cloudagent-python#2441

@WadeBarnes would you rather log a separate issue to track upgrading those, or keep this open until everything is completed?

WadeBarnes commented 7 months ago

I'd like to be able to upgrade all of the instances together. It makes it more difficult when we split things up to track special cases.

WadeBarnes commented 7 months ago

I'm going to test the 0.12.0rc0 release to see if it resolves the REV_REG entry endorsement issue, in case there was something wrong with the prerelease image I was using for testing previously.

WadeBarnes commented 7 months ago

0.12.0rc0 resolves the issues with routing of REVOC_REG_ENTRYs. Test results here; https://github.com/hyperledger/aries-cloudagent-python/issues/2441#issuecomment-1921696128

WadeBarnes commented 7 months ago

Endorser service upgrades have been performed in test and are ready for testing:

cc @Gavinok, @loneil, @esune

esune commented 6 months ago

I tested in both the CANdy and BCovrin test endorser services and everything seems to work smoothly. I'd say we're ready to promote to Production.

esune commented 6 months ago

@WadeBarnes I'll be testing in dev the feature adding an extra description column to the allow lists, please wait with the promotion to prod until that is confirmed working so we promote once.

WadeBarnes commented 6 months ago

@esune, Just double checked and confirmed the latest version of the hyperledger/aries-endorser-service code is deployed to dev. However they don't seem to include a description column in the database schema.

esune commented 6 months ago

I tested the new change in dev everything seems to work as expected. We can update test and prod with the latest service image.

WadeBarnes commented 5 months ago

The updates have been deployed to test. @esune, please verify and let me know when everything is OK. Following that we can test the deployment using the pipelines the @rajpalc7 built.

WadeBarnes commented 5 months ago

@esune has performed testing in dev and test, and has indicated we're good to promote to prod. I'll look into promoting to prod tomorrow morning. Following that we can perform deployment testing the work @rajpalc7 has done. @rajpalc7, continue with what you're doing for the moment, and hold off on deployment testing until I give you the go-ahead.

WadeBarnes commented 5 months ago

The latest changes have been promoted to prod, and the rules for CANdy-Prod and Sovrin MainNet have been applied.

@rajpalc7, It's safe to perform testing on your pipeline, deploying into the test environment. Let me know on RC when you're ready to do that and we can coordinate.