bcgov / orgbook-configurations

Build and Deployment Configurations for the Indy-Catalyst version of the OrgBook
Apache License 2.0
3 stars 11 forks source link

Investigate growing OrgBook - Dev Wallet #124

Open WadeBarnes opened 1 year ago

WadeBarnes commented 1 year ago

On 2023/02/06 the OrgBook-Dev backup reported an out of diskspace issue. On investigation we found the wallet database was growing daily. This is unexpected since the cron jobs are turned off in dev on the BC Registries side so no corporate records are being processed, so no credentials are being issued.

On 2023/02/07 another check was done and the issue with the growing wallet was confirmed. Now we need to determine what is causing the issue.

Agents are running aca-py v0.7.1

Summary of size increase between 2023/02/06 and 2023/02/07

Summary of size increase between 2023/02/07 and 2023/02/08 - After pausing the monitors

Data from 2023/02/06

wallet-bc-agent_bc_wallet backups:

agent_bc_wallet database info:

agent_bc_wallet=# select * from items_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
     762999 |      24 | t
(1 row)
agent_bc_wallet=# select * from metadata_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
          1 |       0 | t
(1 row)
$ ./manage -p bc -e dev getDbDiskUsage wallet-bc

Filesystem                                     Size  Used Avail Use% Mounted on
/dev/mapper/3600a0980383143564f5d5065654a4b44  200G   12G  189G   6% /var/lib/pgsql/data
$ ./manage -p bc -e dev getRecordCounts wallet agent_bc_wallet

 table_schema |   table_name   | count_rows | disk_usage 
--------------+----------------+------------+------------
 public       | tags_encrypted |    3917990 | 2129 MB
 public       | items          |     688215 | 8215 MB
 public       | metadata       |          1 | 48 kB
 public       | tags_plaintext |          0 | 40 kB
(4 rows)

Data from 2023/02/07

wallet-bc-agent_bc_wallet backups:

agent_bc_wallet database info:

OrgBook - Dev - Wallet PVC

agent_bc_wallet=# select * from items_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
     778229 |       1 | t
(1 row)
agent_bc_wallet=# select * from metadata_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
          1 |       0 | t
(1 row)
$ ./manage -p bc -e dev getDbDiskUsage wallet-bc

Filesystem                                     Size  Used Avail Use% Mounted on
/dev/mapper/3600a0980383143564f5d5065654a4b44  200G   12G  189G   6% /var/lib/pgsql/data
$ ./manage -p bc -e dev getRecordCounts wallet agent_bc_wallet

 table_schema |   table_name   | count_rows | disk_usage 
--------------+----------------+------------+------------
 public       | tags_encrypted |    3932744 | 2139 MB
 public       | items          |     702969 | 8400 MB
 public       | metadata       |          1 | 48 kB
 public       | tags_plaintext |          0 | 40 kB
(4 rows)

Data from 2023/02/08 - After pausing the monitors

wallet-bc-agent_bc_wallet backups:

agent_bc_wallet database info:

OrgBook - Dev - Wallet PVC

agent_bc_wallet=# select * from items_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
     779797 |      25 | t
(1 row)
agent_bc_wallet=# select * from metadata_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
          1 |       0 | t
(1 row)
$ ./manage -p bc -e dev getDbDiskUsage wallet-bc

Filesystem                                     Size  Used Avail Use% Mounted on
/dev/mapper/3600a0980383143564f5d5065654a4b44  200G   12G  189G   6% /var/lib/pgsql/data
$ ./manage -p bc -e dev getRecordCounts wallet agent_bc_wallet

 table_schema |   table_name   | count_rows | disk_usage 
--------------+----------------+------------+------------
 public       | tags_encrypted |    3934284 | 2140 MB
 public       | items          |     704509 | 8420 MB
 public       | metadata       |          1 | 48 kB
 public       | tags_plaintext |          0 | 40 kB
(4 rows)
WadeBarnes commented 1 year ago

Agent Logs (Set to WARNING):

API Logs (Set to WARN):

WadeBarnes commented 1 year ago

The last credential records that were added to the OrgBook were added 2022-12-19.

WadeBarnes commented 1 year ago

Info from the test environment (2023/02/07) for monitoring and comparison:

agent_bc_wallet=# select * from items_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
    7859315 |      28 | t
(1 row)
agent_bc_wallet=# select * from metadata_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
          1 |       0 | t
(1 row)
$ ./manage -p bc -e test getDbDiskUsage wallet-bc

Filesystem                                     Size  Used Avail Use% Mounted on
/dev/mapper/3600a098038314354633f506d705a6a59  200G   92G  108G  47% /var/lib/pgsql/data
$ ./manage -p bc -e test getRecordCounts wallet agent_bc_wallet

 table_schema |   table_name   | count_rows | disk_usage 
--------------+----------------+------------+------------
 public       | tags_encrypted |  122722647 | 61 GB
 public       | items          |    4201278 | 30 GB
 public       | metadata       |          1 | 48 kB
 public       | tags_plaintext |          0 | 40 kB
(4 rows)
WadeBarnes commented 1 year ago

One theory is the Credential Verification monitoring is triggering proof exchange records and they are not being cleaned up.

To test this, we've paused the Credential Verification monitors, to see if that results fewer records being added to the wallet.

WadeBarnes commented 1 year ago

@ianco, I think we found the issue. Have a look at the numbers under the Summary of size increase between 2023/02/07 and 2023/02/08 - After pausing the monitors section. There is a significant decrease in the number of records created with the monitors turned off. The number of records created could easily be accounted for by the few hours the monitors were running after the data from yesterday was collected.

WadeBarnes commented 1 year ago

Info from the test environment (2023/02/08) for monitoring and comparison:

agent_bc_wallet=# select * from items_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
    7873125 |      23 | t
(1 row)
agent_bc_wallet=# select * from metadata_id_seq;
 last_value | log_cnt | is_called 
------------+---------+-----------
          1 |       0 | t
(1 row)
$ ./manage -p bc -e test getDbDiskUsage wallet-bc

Filesystem                                     Size  Used Avail Use% Mounted on
/dev/mapper/3600a098038314354633f506d705a6a59  200G   93G  108G  47% /var/lib/pgsql/data
$ ./manage -p bc -e test getRecordCounts wallet agent_bc_wallet

 table_schema |   table_name   | count_rows | disk_usage 
--------------+----------------+------------+------------
 public       | tags_encrypted |  122736427 | 61 GB
 public       | items          |    4215058 | 30 GB
 public       | metadata       |          1 | 48 kB
 public       | tags_plaintext |          0 | 40 kB
(4 rows)

Summary:

This confirms test is affected by the same issue, and by extension prod will also be affected.

swcurran commented 1 year ago

Is this happening in test and prod, or just Dev?

WadeBarnes commented 1 year ago

From https://github.com/bcgov/orgbook-configurations/issues/124#issuecomment-1422603348

Summary:

This confirms test is affected by the same issue, and by extension prod will also be affected.

ianco commented 1 year ago

Is this happening in test and prod, or just Dev?

If we're running the same monitor, then it's happening in test and prod as well and we just haven't noticed due to the larger records counts in these environments.

ianco commented 1 year ago

Even though all the data in the database in encrypted, we can use the type column in a where clause, because the encrypted value will always be the same if the type is the same

So for example you can run this query to see how many records there are for each record type:

select type from items
where id = 
(select max(id) from items);

(this works in orgbook dev because we know the most recent record is a proof exchange)

... and if you know which record is of a certain type:

select type, count(*)
from items
where type = 
(select type from items
 where id = 
 (select max(id) from items)
)
group by type;

... so we could delete the records directly from the wallet using this kind of filter

The trick is we also need to delete all the tags associated with these records.

WadeBarnes commented 1 year ago

Scripts to delete the presentation requests from a wallet; https://github.com/bcgov/von-bc-registries-audit/pull/26

esune commented 8 months ago

To resolve this permanently, the ACA-Py instance for OrgBook needs to be updated to a newer version where presentation records are deleted by default.

WadeBarnes commented 7 months ago

This is the PR containing the desired behavior, deleting the presentation exchange records; https://github.com/hyperledger/aries-cloudagent-python/pull/2352. First available in aca-py v0.10.1

WadeBarnes commented 7 months ago

The other consideration during the upgrade is the wallet type. OrgBook and BC Reg agents are still using indy storage. I'd like to separate the storage migration from the fix for this issue, so will be using an image that still supports indy storage.

WadeBarnes commented 7 months ago

Both py3.9-indy-1.16.0-0.10.5 and py3.9-indy-1.16.0-0.11.0 are available.

@swcurran, any recommendation on which version, v0.10.5 or v0.11.0? I'm thinking v0.10.5 since we're upgrading from v0.7.0 on the BC Reg side and v0.7.1 on the OrgBook side. I'm leaning toward v0.10.5 to avoid any potential issues with the DIDComm message changes, or do you see that as a non-issue?

swcurran commented 7 months ago

Are you updating both the issuer and OrgBook? If not, then yes, you will probably want to miss that breaking change. If you are doing both, then go with 11. I assume the --emit-new-didcomm-prefix and --emit-new-didcomm-mime-type are not set already in the environment? If so then might as well go to 0.11.

WadeBarnes commented 7 months ago

Are you updating both the issuer and OrgBook? If not, then yes, you will probably want to miss that breaking change. If you are doing both, then go with 11. I assume the --emit-new-didcomm-prefix and --emit-new-didcomm-mime-type are not set already in the environment? If so then might as well go to 0.11.

Yes, I'll be upgrading both. Correct, neither --emit-new-didcomm-prefix nor --emit-new-didcomm-mime-type have ever been set on either service.

WadeBarnes commented 7 months ago

Will be upgrading BC Reg and OrgBook agents to py3.9-indy-1.16.0-0.11.0

WadeBarnes commented 7 months ago

I've upgraded the BC Reg and OrgBook dev agents. Then tried posting credentials from BC Reg to the OrgBook. All failed. So I'll be collecting additional info and post here.

WadeBarnes commented 7 months ago

OrgBook Agent Logs:

BC Reg Agent Logs:

BC Reg Controller Logs:

OrgBook API logs (from another run):

WadeBarnes commented 7 months ago

Deployed the py3.9-indy-1.16.0-0.10.5 image and ran into failures as well. Going to try rolling back to the previous aca-py version.

WadeBarnes commented 7 months ago

The credentials were posted successfully using the previous aca-py versions; v0.7.0 on the BC Registries side, and v0.7.1 on the OrgBook side.

WadeBarnes commented 7 months ago

Does the issuer_registration plugin need to be updated to support the new aca-py version(s)?

esune commented 7 months ago

Does the issuer_registration plugin need to be updated to support the new aca-py version(s)?

Logged https://github.com/bcgov/aries-vcr/issues/766 to address this and upgrading to a newer agent version