Open serbinsh opened 2 years ago
for illinois I switched to running it at midnight, that way next morning database is synced. This is something I want to look at in the future (time permitting). Question is do you need to sync so frequently? Probably worth it to check on how frequently things change.
Is it mostly a particular (eg runs) table / set of tables? We could consider cleaning out the unnecessary records and / or only syncing those tables infrequently?-- David
--
David LeBauer Director of Data Sciences CALS Comm and Technologies THE UNIVERSITY OF ARIZONA
BSRL, 207 1230 N Cherry Ave | Tucson, AZ 85721 Office: 520-621-4381 | Cell: 760-468-8621 @.***
datascience.cals.arizona.edu orcid | twitter | github | linkedin
@dlebauer that is a good idea, we can skip the runs table, which will be the biggest set. That is a quick (famous last words) fix.
I also will note seeing errors like this on some occasions
---- BU
URL with bety dump : https://psql-pecan.bu.edu/sync/dump/bety.tar.gz
Remote start ID : 1000000001
Remote end ID : 1999999999
Local start ID : 2000000001
Local end ID : 2999999999
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The rationale for syncing more is that for collaborative projects (e.g .NASA CMS) across institutions there may be/is a need to be able to share updated posteriors etc. Waiting 24hrs between syncs could be an issue.
I am game to try a more complicated sync where we sync what we can more often and only sync say the runs every 1 day or so.
I think so , or at least there is still an issue. Perhaps not the same as the missing file. See below for the sync log. it looks like the syncing still times out before it gets trhough all of the tables
Syncing BETYdb
Mon 24 Jan 2022 03:00:01 AM EST
/data/home/sserbin
---- BU
URL with bety dump : https://psql-pecan.bu.edu/sync/dump/bety.tar.gz
Remote start ID : 1000000001
Remote end ID : 1999999999
Local start ID : 2000000001
Local end ID : 2999999999
Checking schema : MATCHED SCHEMA version 9d0f7330c9ef2f0572c0bbbfa463a59c
Started psql (pid=1535141)
Updated formats : 77
Updated machines : 5
Updated mimetypes : 3
Updated users : 51
Updated attributes : 4492
Updated benchmarks : 73
Updated citations : 153
Updated covariates : 26
Updated ensembles : 31042 (+1)
Updated inputs : 521824 (+398)
Updated likelihoods : 4254893
Updated managements : 2
Updated metrics : 13
Updated methods : 1
Updated models : 33
Updated modeltypes : 15
Updated pfts : 200
Updated posteriors : 22611 (+1)
Updated priors : 528
Updated reference_runs : 121
Updated runs : 3043167 (+100)
Updated sites : 19486
Updated species : 21022
Updated treatments : 10
Updated variables : 381
Updated workflows : 16636 (+2)
Updated sitegroups : 27
Updated dbfiles : 624300 (+572)
Updated traits : 1750
Updated benchmarks_benchmarks_reference_runs : 366
Updated benchmarks_ensembles : 84
Updated benchmarks_ensembles_scores : 3349
Updated benchmarks_metrics : 631
Updated citations_sites : 254
Updated citations_treatments : 7
Updated formats_variables : 183
Updated inputs_runs : 691
Updated managements_treatments : 1
Updated modeltypes_formats : 18
Updated pfts_priors : 2215
Updated pfts_species : 181463
Updated posteriors_ensembles : 691855 (+1)
Updated sitegroups_sites : 19731
Pretty sire I have raised this before but when our DB has gone done for a period of time and I re-enable the sync, syncing with BU takes / can take a very long time (hours). Even during the course of normal operation and syncing it sometimes doesnt finish before my next sync cron job starts.
Does anyone have any suggestions on how they are managing to keep their DBs synced but avoiding jobs not completing before the next? Should I create a separate sync with BU that runs slower than say with WI and UIUC?
Feedback requested.