Closed ConradBoydElliottGustafson closed 2 years ago
Size: db.t3.micro (as opposed to 0 capacity units for famdb)
Results:
Errors: postgres@CA-L3H3ZPN2://mnt/c/Clients/FSA/git$ psql --host=rds-temp-postgres.cp9oqzf51oiq.ca-central-1.rds.amazonaws.com --port=5432 --username=postgres --password --dbname=rds-postgres Password:
psql: error: could not connect to server: Connection timed out Is the server running on host "rds-temp-postgres.cp9oqzf51oiq.ca-central-1.rds.amazonaws.com" (10.12.53.118) and accepting TCP/IP connections on port 5432?
While RDS instance isn't accessible externally, AWS RDS allows one to do manual or automatic snapshots and restore as desired from snapshots to new or existing database instance. I'm not sure if the snapshot can be exported as a file (to e.g. use locally), that would be worth testing.
@basilv : That is correct. Through my testing, those snapshots are Aurora RDS snapshots and only usable by RDS. As an alternative, you can save the snapshot to S3, but even if you download it from S3, it will be a Parquet format file, not a postgres-compatible backup file.
Sorry, thought you were talking above about managed RDS rather than Aurora.
Update: Conversations with the AWS Team confirmed our findings. Aurora snapshots are not to be used for restoring non-Aurora Postgres DBs.
Alternatives:
Investigating DMS:
Richard Pardo-Figueroa @rpardofigueroa 8:31 AM hello team: I am facing this warning when trying to create a Replication instance using AWS DMS (Database Migration Service):
DMS requires access permissions to manage your VPC resources. By clicking Create replication instance, you grant permission for DMS to create a role that has this access.
followed by this error:
User: arn:aws:sts::521834415778:assumed-role/BCGOV_WORKLOAD_developer_umafubc9/rpardofi@idir is not authorized to perform: iam:GetRole on resource: role dms-vpc-role because no identity-based policy allows the iam:GetRole action
Could someone provide me with some assistance on resolving this issue? Is it perhaps because DMS does not seem to be available as a AWS Service, approved by the Pathfinder team? or I need elevated privileges to allow the DMS service to create a role?
@rparfig is investigating why DMS can't see the FAM DB. consider allowing Security groups, inbound and outbound flow from the DB.
DMS would need to query the FAM DB so this means the FAM DB security group inbound flow needs to allow access from DMS, while DMS security group outbound flow needs to allow access to FAM. But usually outbound flows are unlimited, so it is likely just the inbound flow for FAM DB that needs to be configured.
Had a session with Richard and got things working.
Learnings from the AWS configuration - there's a set of initial security groups created as part of the initial environment. The database should be placed in the data_sg security group, and the app in the app_sg security group, which then allows network access via the inbound rule in data_sg allowing traffic from resources in the app_sg security group.
To get DMS to work, it needed to be placed in the app_sg security group.
The existing FAM aurora database has a dedicated security group created by terraform that likely isn't needed.
My access in AWS (as a developer role) did not allow me to even see any of the RDS database instances. I could see all the security groups, but was unable to see the details of the terraform-created security group (e.g. couldn't see the inbound rules, while I could see the rules of the initially created groups like data-sg). Richard with his admin role was able to see all of this.
Testing was successfully completed using DMS, copying the FAM database to a PostgreSQL DB also hosted in RDS AWS.
Challenge:
Action:
Given that we have proven that DMS can be used to copy the structures and data of an RDS database to a vanilla progress, we can assume that this migration method will work. However, we will try the following:
Alternative suggested by @ConradBoydElliottGustafson :
Actions:
Reached out to the Amazon team for assistance, explaining the issue with not being able to run the shell in the newly created EC2 instance. Awaiting response
In a new attempt to connect to the instance, I found that the Session Manager is now enabled. The instance was created almost 24 hours ago and it is strange to see that suddenly became enabled.
Started shell and installed Postgres executing the following commands successfully:
$ sudo amazon-linux-extras install epel $ sudo tee /etc/yum.repos.d/pgdg.repo<<EOF [pgdg13] name=PostgreSQL 13 for RHEL/CentOS 7 - x86_64 baseurl=http://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64 enabled=1 gpgcheck=0 EOF $ sudo yum install postgresql13 postgresql13-server $ sudo /usr/pgsql-13/bin/postgresql-13-setup initdb $ systemctl status postgresql-13 $ sudo su - postgres $ psql postgres=# create database famdb; CREATE DATABASE postgres=# create schema app_fam; CREATE SCHEMA
Firewall seems to allow all traffic from subnet, but facing this issue:
Test Endpoint failed: Application-Status: 1020912, Application-Message: Cannot connect to ODBC provider Network error has occurred, Application-Detailed-Message: RetCode: SQL_ERROR SqlState: 08001 NativeError: 101 Message: [unixODBC]could not connect to server: Connection refused Is the server running on host "ip-10-12-102-245.ca-central-1.compute.internal" (10.12.102.245) and accepting TCP/IP connections on port 5432?
Reached out to the OCIO folks as well as to the AWS team. No response from OCIO, waiting for AWS to show availability or book something for us, as Brent is quite booked up.
Started from scratch, after realizing that there is a setting that I was unintentionally ignoring, that allowed me to create an EC2 outside of the Application subnet.
DMS resides in the App group, so it should be able to connect to anything on the App subnet, but 5432 is not open in that subnet, so the first EC2 was expected to fail.
You can always add a new security group with a custom inbound rule to allow the traffic from DMS. But I agree that the second scenario should have worked.
Tried a different approach. Using the EC2 running on the APP Security Group, I tried to run pg_dump against the RDS Aurora Postgres DB and it worked:
-bash-4.2$ pg_dump -h fam-aurora-db-postgres.cluster-cp9oqzf51oiq.ca-central-1.rds.amazonaws.com -p 5432 -Fc -U sysadmin famdb > app_fam_db.dump
Password:
-bash-4.2$ ls -ltr
total 52
drwx------ 4 postgres postgres 51 Jul 27 22:14 13
-rw-r--r-- 1 postgres postgres 51075 Aug 9 00:06 app_fam_db.dump
-bash-4.2$ pg_dumpall -U sysadmin -h fam-aurora-db-postgres.cluster-cp9oqzf51oiq.ca-central-1.rds.amazonaws.com -p 5432 --clean --file=fam_and_alldbs.dump
Password:
pg_dumpall: error: query failed: ERROR: permission denied for relation pg_authid
pg_dumpall: error: query was: SELECT rolname FROM pg_authid WHERE rolname !~ '^pg_' ORDER BY 1
-bash-4.2$ psql psql (13.7) Type "help" for help.
postgres=# create user fam_proxy_api;
CREATE ROLE
postgres=# create database app_fam_restored;
CREATE DATABASE
postgres=# \q
-bash-4.2$ pg_restore -d app_fam_restored app_fam_db.dump
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5; 2615 118795 SCHEMA app_fam sysadmin
pg_restore: error: could not execute query: ERROR: role "sysadmin" does not exist
Command was: ALTER SCHEMA app_fam OWNER TO sysadmin;
-bash-4.2$
-bash-4.2$ psql -d app_fam_restored
psql (13.7)
Type "help" for help.
app_fam_restored=# \d app_fam.fam_application;
Table "app_fam.fam_application"
Column | Type | Collation | Nullable | Default
-------------------------+--------------------------------+-----------+----------+----------------------------------
application_id | bigint | | not null | generated by default as identity
application_name | character varying(100) | | not null |
application_description | character varying(200) | | not null |
create_user | character varying(30) | | not null |
create_date | timestamp(6) without time zone | | not null | CURRENT_DATE
update_user | character varying(30) | | |
update_date | timestamp(6) without time zone | | | CURRENT_DATE
Indexes:
"fam_app_pk" PRIMARY KEY, btree (application_id)
"fam_app_name_uk" UNIQUE CONSTRAINT, btree (application_name)
Referenced by:
TABLE "app_fam.fam_application_group_xref" CONSTRAINT "reffam_application20" FOREIGN KEY (application_id) REFERENCES app_fam.fam_application(application_id)
TABLE "app_fam.fam_role" CONSTRAINT "reffam_application22" FOREIGN KEY (application_id) REFERENCES app_fam.fam_application(application_id)
TABLE "app_fam.fam_application_client" CONSTRAINT "reffam_application31" FOREIGN KEY (application_id) REFERENCES app_fam.fam_application(application_id)
app_fam_restored=# select * from app_fam.fam_application;
application_id | application_name | application_description | create_user | create_date | update_user | update_date
----------------+------------------+---------------------------+---------------+---------------------+---------------+---------------------
1 | fam | Forests Access Management | fam_proxy_api | 2022-07-20 00:00:00 | fam_proxy_api | 2022-07-20 00:00:00
2 | fom | Forest Operations Map | fam_proxy_api | 2022-07-20 00:00:00 | fam_proxy_api | 2022-07-20 00:00:00
(2 rows)
app_fam_restored=#
Meeting with the Pathfinder team scheduled for Monday Aug 15th@3PM, to follow up on configuration issues on our name plate or perhaps in the internal configuration of our AWS Tenancy and the security guardrails set in place.
@rparfig please can you include me in this meeting. In the meantime, are we able to stop the two EC2 instances if we are not using them until next week after discussions with the pathfinder team.
@fadekemiatitebi : I was careful on choosing the Amazon Free Tier for these EC2 instances, so we should not be charged for these services. However, I have shutdown both EC2s. I have also forwarded you the invite for the meeting with the Pathfinder team on Monday
Thanks @rparfig, It looks like there are some EC2 charges, although not significant. If you still need the instance, you can keep it running. Otherwise stopping it may not be a bad idea since we can always restart it.
Issue has been solved, with the assistance of the Pathfinder team, during the session held on Aug 15, 2022:
postgresql.conf should be changed to include: ssl = on listen_addresses = "*"
Restarting the PostgreSQL service, allowed for successful traffic from DMS to the EC2 instance.
Outstanding / pending for decision:
@rparfig - 'prescind' - thanks for the vocabulary expansion
We want to know if we are "locking in" to a particular flavor of postres or if we could move our data to something else (Azure, on-prem, etc.)
AC: Try to export from Aurora and import into a local PostgreSQL db Try to export from RDS PostgreSQL and import into a local db Report the results and discuss with PO if it impacts our architecture