GSA-TTS / FAC

GSA's Federal Audit Clearinghouse
Other
19 stars 5 forks source link

Loading data #3994

Open jadudm opened 3 months ago

jadudm commented 3 months ago

SOP

See https://github.com/GSA-TTS/FAC/issues/3865 for prior data loading SOP and notes for this work.

Steps

### Tasks
- [x] Load ~3K `singleauditchecklists` (and friends) via load utility
- [x] Load ~3K `dissemination_` via shell script
- [ ] Load ~300 `singleauditchecklists` (and friends) via load utility
- [x] Load ~300 `dissemination_` via shell script
- [x] Load `migrationstatus` (~280K rows) via shell script
- [ ] Load ~277K `singleauditchecklist` (and friends) via load utility
- [x] Load `historic_` tables

Next steps

  1. Bring the MV(s) into the API
  2. Document/publish what remains (~10^1 audits), possibly on static site
  3. Figure out how to expose the PDFs for what remains
### Tasks
- [x] https://github.com/GSA-TTS/FAC/issues/4029
- [ ] https://github.com/GSA-TTS/FAC/issues/4034
- [ ] https://github.com/GSA-TTS/FAC/issues/4043
gsa-suk commented 3 months ago

Data loading requirements for dissemination tables and migration status table are described here - https://github.com/GSA-TTS/FAC/issues/4008

jadudm commented 3 months ago

Ran backup.

Check table counts

export CONN=postgresql://username:secret@localhost:port/db_name
psql  "$CONN" -c "SELECT count(*) FROM public.audit_singleauditchecklist"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_singleauditreportfile"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_submissionevent"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_excelfile"
psql  "$CONN" -c "SELECT count(*) FROM public.auth_user"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_access"

count

46218 (1 row)

count

50116 (1 row)

count

919768 (1 row)

count

205211 (1 row)

count

56069 (1 row)

count

226471 (1 row)

jadudm commented 3 months ago

Run the utility

python sync_to_remote.py --clean sqlite/fhdp.sqlite config.json
jadudm commented 3 months ago

After load of SAC:

psql  "$CONN" -c "SELECT count(*) FROM public.audit_singleauditchecklist"
 count 
-------
 49025

49025 - 46218 = 2807

We expected 2798. However, we are submitting against a live database/live system, and additional SAC records would have appeared.

jadudm commented 3 months ago
psql  "$CONN" -c "SELECT count(*) FROM public.audit_singleauditchecklist"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_singleauditreportfile"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_submissionevent"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_excelfile"
psql  "$CONN" -c "SELECT count(*) FROM public.auth_user"
psql  "$CONN" -c "SELECT count(*) FROM public.audit_access"

count

49063 (1 row)

count

53013 (1 row)

count

921070 (1 row)

count

205436 (1 row)

count

56132 (1 row)

count

235065 (1 row)

gsa-suk commented 2 months ago

Load script for dissemination tables - https://docs.google.com/document/d/1baSE3IjPfyan4g5qJ7iKWj6cauBsEK3T43MibXlutwU/edit

jadudm commented 2 months ago

These are the commands run in production for the load of the 3K dissemination_.

-- Login to production
-- Copy local files to production
cf t -s production
# SSH enabled at 14:22 EST

cat dissemination_additionalein.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_additionalein.pgsql"
cat dissemination_additionaluei.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_additionaluei.pgsql"
cat dissemination_captext.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_captext.pgsql"
cat dissemination_federalaward.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_federalaward.pgsql"
cat dissemination_finding.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_finding.pgsql"
cat dissemination_findingtext.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_findingtext.pgsql"
cat dissemination_general.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_general.pgsql"
cat dissemination_invalidauditrecord.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_invalidauditrecord.pgsql"
cat dissemination_issuedescriptionrecord.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_issuedescriptionrecord.pgsql"
cat dissemination_migrationinspectionrecord.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_migrationinspectionrecord.pgsql"
cat dissemination_note.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_note.pgsql"
cat dissemination_onetimeaccess.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_onetimeaccess.pgsql"
cat dissemination_passthrough.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_passthrough.pgsql"
cat dissemination_secondaryauditor.pgsql | cf ssh gsa-fac -c "cat > ./dissemination_secondaryauditor.pgsql"

cf ssh gsa-fac
ls
/tmp/lifecycle/shell
source tools/setup_env.sh
setup_env
set +e

export PATH=/home/vcap/deps/0/apt/usr/lib/postgresql/15/bin:$PATH
export PATH=/home/vcap/app/bin:$PATH

export db_name=$(echo $VCAP_SERVICES | jq -r '.["aws-rds"][] | select(.name=="fac-db") | .credentials.name')
export username=$(echo $VCAP_SERVICES | jq -r '.["aws-rds"][] | select(.name=="fac-db") | .credentials.username')
export host=$(echo $VCAP_SERVICES | jq -r '.["aws-rds"][] | select(.name=="fac-db") | .credentials.host')
export port=$(echo $VCAP_SERVICES | jq -r '.["aws-rds"][] | select(.name=="fac-db") | .credentials.port')
export password=$(echo $VCAP_SERVICES | jq -r '.["aws-rds"][] | select(.name=="fac-db") | .credentials.password')
export PGPASSWORD=$password

export CONN=postgresql://$username:$PGPASSWORD@$host:$port/$db_name

cd ..
ls
ls -alh

# -rw-r--r-- 1  346K Jul  3 18:23 dissemination_additionalein.pgsql
# -rw-r--r-- 1   77K Jul  3 18:23 dissemination_additionaluei.pgsql
# -rw-r--r-- 1  4.7M Jul  3 18:23 dissemination_captext.pgsql
# -rw-r--r-- 1  161M Jul  3 18:23 dissemination_federalaward.pgsql
# -rw-r--r-- 1   28M Jul  3 18:23 dissemination_finding.pgsql
# -rw-r--r-- 1   12M Jul  3 18:23 dissemination_findingtext.pgsql
# -rw-r--r-- 1  5.5M Jul  3 18:23 dissemination_general.pgsql
# -rw-r--r-- 1   39M Jul  3 18:24 dissemination_invalidauditrecord.pgsql
# -rw-r--r-- 1   317 Jul  3 18:24 dissemination_issuedescriptionrecord.pgsql
# -rw-r--r-- 1   91M Jul  3 18:24 dissemination_migrationinspectionrecord.pgsql
# -rw-r--r-- 1  4.6M Jul  3 18:24 dissemination_note.pgsql
# -rw-r--r-- 1   308 Jul  3 18:24 dissemination_onetimeaccess.pgsql
# -rw-r--r-- 1   31M Jul  3 18:24 dissemination_passthrough.pgsql
# -rw-r--r-- 1   20K Jul  3 18:24 dissemination_secondaryauditor.pgsql

psql  $CONN -c "SELECT count(*) FROM public.dissemination_general WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_general.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_general WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_general" 

psql  $CONN -c "SELECT count(*) FROM public.dissemination_secondaryauditor WHERE report_id ilike '%CENSUS%'" 
psql  $CONN --quiet -f dissemination_secondaryauditor.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_secondaryauditor WHERE report_id ilike '%CENSUS%'" 
psql  $CONN -c "ANALYZE public.dissemination_secondaryauditor"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_additionaluei WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_additionaluei.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_additionaluei WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_additionaluei"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_additionalein WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_additionalein.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_additionalein WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_additionalein"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_captext WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_captext.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_captext WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_captext"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_findingtext WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_findingtext.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_findingtext WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_findingtext"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_finding WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_finding.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_finding WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_finding"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_note WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_note.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_note WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_note"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_passthrough WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_passthrough.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_passthrough WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_passthrough"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_federalaward WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_federalaward.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_federalaward WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_federalaward"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_invalidauditrecord WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_invalidauditrecord.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_invalidauditrecord WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_invalidauditrecord"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_migrationinspectionrecord WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_migrationinspectionrecord.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_migrationinspectionrecord WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_migrationinspectionrecord"

# psql  $CONN -c "SELECT count(*) FROM public.dissemination_issuedescriptionrecord WHERE report_id ilike '%CENSUS%'"
# psql  $CONN --quiet -f dissemination_issuedescriptionrecord.pgsql
# psql  $CONN -c "SELECT count(*) FROM public.dissemination_issuedescriptionrecord WHERE report_id ilike '%CENSUS%'"
# psql  $CONN -c "ANALYZE public.dissemination_issuedescriptionrecord"

psql  $CONN -c "SELECT count(*) FROM public.dissemination_onetimeaccess WHERE report_id ilike '%CENSUS%'"
psql  $CONN --quiet -f dissemination_onetimeaccess.pgsql
psql  $CONN -c "SELECT count(*) FROM public.dissemination_onetimeaccess WHERE report_id ilike '%CENSUS%'"
psql  $CONN -c "ANALYZE public.dissemination_onetimeaccess"

ls dissemination_*
rm dissemination_*.pgsql
ls -alh

# SSH disabled at 15:15 EST
gsa-suk commented 2 weeks ago

09/09/24 - Load 318 dissemination data and migration status to Prod (Sudha, Hassan, Rochelle, Matt) https://github.com/GSA-TTS/FAC/issues/4266

gsa-suk commented 2 weeks ago

09/10/24 - Copy census tables to Prod S3 (Sudha, Matt, Rochelle) https://github.com/GSA-TTS/FAC/issues/4271

gsa-suk commented 2 weeks ago

09/12/24 - Load census historical tables to Prod - (Sudha, Matt, Rochelle)

https://github.com/GSA-TTS/FAC/issues/4279