Closed sillitoe closed 7 years ago
Ah yes. I should have done that and documented it as part of the release process.
Let me have a look at this now. I'll aim to do a simple "hand-edit this file" version that you can then improve as you see fit.
Thanks, this was meant for my own docs (I should've self assigned).
Great if you're okay to look into it.
Progress update:
The relevant crontab entry is at line 31 of puppet/trunk/files/crontabs/update/cathupdate.
That's running update/trunk/database/cath_b.sql on the database. I think the relevant parts of that are:
CASE WHEN dh.domain_id IS NULL THEN 'v4_0_0' ELSE 'putative' END,
...
event_timestamp >= '2013-05-10'
Right. I've:
I think it's worth leaving this ticket open for a day to check the overnight cath_b dump is as expected.
Further improvements you could consider:
Hmmm... would be good to set version as ENV from crontab, but not if we stick to raw SQL.
Maybe pass the SQL through something that changes "VERSION" to "v4_1_0" before it all gets piped into psql
?
(sorry, didn't see your most recent comment when I posted that last post)
implement that via a releases table in the DB (a table we've discussed many times 😐)
this would be my preferred option.
Thanks for sorting - I'll close when the file looks okay tomorrow.
(sorry, didn't see your most recent comment when I posted that last post)
NP. Indeed, taking the version & date out of the SQL into the crontab would be a big step in the right direction and would probably be much quicker than adding new DB tables etc.
It seems you can do this sort of thing:
> cat /tmp/bob.sql
SELECT COUNT(*) FROM domain WHERE flow_stage_type = :dom_fst;
> psql -A -t -F ' ' cathdb_current --variable=dom_fst="'ASSIGNED'" --file /tmp/bob.sql
421066
> psql -A -t -F ' ' cathdb_current --variable=dom_fst="'HOLDING_PEN'" --file /tmp/bob.sql
22211
(where the -A -t -F ' '
is just to simplify the output to a single number)
Nice :sunglasses: - I think that could be all you need here.
This is looking sensible to me...
Numbers of lines in files from the night before last:
420420 cath_b.20161130.all
235310 cath_b.20161130.latest_release
185110 cath_b.20161130.putative
...and from last night:
421494 cath_b.20161201.all
308898 cath_b.20161201.latest_release
112596 cath_b.20161201.putative
To me, that looks like a sensible shift of numbers from putative to latest_release. Also, the version in the files seem sensible:
cath_b.20161130.all:101mA00 v4_0_0 1.10.490.10 0-153:A
cath_b.20161130.all:102lA00 v4_0_0 1.10.530.40 1-162:A
cath_b.20161130.all:102mA00 v4_0_0 1.10.490.10 0-153:A
cath_b.20161130.latest_release:101mA00 v4_0_0 1.10.490.10 0-153:A
cath_b.20161130.latest_release:102lA00 v4_0_0 1.10.530.40 1-162:A
cath_b.20161130.latest_release:102mA00 v4_0_0 1.10.490.10 0-153:A
cath_b.20161201.all:101mA00 v4_1_0 1.10.490.10 0-153:A
cath_b.20161201.all:102lA00 v4_1_0 1.10.530.40 1-162:A
cath_b.20161201.all:102mA00 v4_1_0 1.10.490.10 0-153:A
cath_b.20161201.latest_release:101mA00 v4_1_0 1.10.490.10 0-153:A
cath_b.20161201.latest_release:102lA00 v4_1_0 1.10.530.40 1-162:A
cath_b.20161201.latest_release:102mA00 v4_1_0 1.10.490.10 0-153:A
@sillitoe : :nose: right to you?
Looks great, many thanks
Ta.
Just to check: did you notice this previous comment? (Don't want to sound like a high-maintenance attention-seeker; just want to check that something potentially time-savingly useful hasn't fallen through the cracks.)
Good point, thanks (you high-maintenance attention seeker).
Will close when that's sorted.
@tonyelewis was there a reason you chose:
event_timestamp >= '2015-04-01'
...for the release date?
I've added a release
table that stores freeze_timestamp
and release_timestamp
- essentially latest
will be the most recent entry with a non-null release_timestamp
(so we can add a frozen release without it moving to latest).
I've adjusted the SQL and it seems to work, but there are difference due to our official "release date" of v4.1. I had July this year - I would happily be told it was earlier but I'm not sure about 3 months after we froze.
??
@tonyelewis was there a reason[...]?
@sillitoe Yes. But contrary to the hopes of Boyzone/The Osmonds, the reason was not love. Instead, it was that when I ran the SQL that I put in the a brief note I added to the "Release Protocol" Trac page (mentioned above), the numbers of domain assignments for the months in the relevant period looked like this:
[...]
2016-07-01 | 4687
2016-06-01 | 2310
2016-05-01 | 881
2016-04-01 | 741
2016-03-01 | 48623
2016-02-01 | 0
2016-01-01 | 3146
2015-12-01 | 27888
2015-11-01 | 4558
2015-10-01 | 1207
2015-09-01 | 3308
2015-08-01 | 10043
2015-07-01 | 1485
2015-06-01 | 92
2015-05-01 | 0
2015-04-01 | 0
2015-03-01 | 0
2015-02-01 | 0
2015-01-01 | 5700
2014-12-01 | 730
[...]
...so I went for somewhere in the middle of that patch of zeroes.
2015-01-01
wouldn't do because it would exclude all the v4_1_0 domains that were assigned whilst we let the frozen PDB flush through the system.
Anytime after 2015-06-01
wouldn't do because then it would include all domains were assigned to cathdb_current after we'd branched off the release.
Does that :nose: right to you?
Hmm okay, so we're saying apr is the freeze
date (rather than the release
date)?
I was going to use the release date to give the official "latest" release.
I suppose I can select "latest" on the release
date, but then use the freeze
date (Apr 2015) for the calculation of CATH-B...
FWIW, I think there are three different dates worth recording here. I agree with what you say about release
date. I also think it's worth recording the date of the PDB snapshot on which the CATH release is built (eg, here: 2015-01-01
).
Maybe something like (in chronological order):
pdb_snapshot_date
assignment_cutoff_date
release_date
?
any chance you could rerun that query and let me know the pdb_snapshot_date
and assignment_cutoff_date
of any of the releases before v4.1?
Yep. NP. I can do assignment_cutoff_date
s back to v3.0:
v4.1.0 2015-04-01
v4.0.0 2013-07-01
v3.5.0 2011-10-01
v3.4.0 2011-01-01
v3.3.0 2009-06-01
v3.2.0 2008-07-01
v3.1.0 2007-02-01
v3.0.0 2006-05-01
I'm not sure this data helps with pdb_snapshot_date
but then you've got those (going back to v3.0) on here: http://www.cathdb.info/wiki/doku/?id=release_notes
Full data for reference...
(note v3.2.0 isn't against a 0 but I checked that one at the week level - 2008-07-01
is OK)
2016-12-01 | 501
2016-11-01 | 3327
2016-10-01 | 3518
2016-09-01 | 12733
2016-08-01 | 0
2016-07-01 | 4687
2016-06-01 | 2310
2016-05-01 | 881
2016-04-01 | 741
2016-03-01 | 48623
2016-02-01 | 0
2016-01-01 | 3146
2015-12-01 | 27888
2015-11-01 | 4558
2015-10-01 | 1207
2015-09-01 | 3308
2015-08-01 | 10043
2015-07-01 | 1485
2015-06-01 | 92
2015-05-01 | 0
2015-04-01 | 0 v4.1.0
2015-03-01 | 0
2015-02-01 | 0
2015-01-01 | 5700
2014-12-01 | 730
2014-11-01 | 3153
2014-10-01 | 1753
2014-09-01 | 3120
2014-08-01 | 2369
2014-07-01 | 1375
2014-06-01 | 1355
2014-05-01 | 3473
2014-04-01 | 6926
2014-03-01 | 7242
2014-02-01 | 3642
2014-01-01 | 1328
2013-12-01 | 1876
2013-11-01 | 6121
2013-10-01 | 23470
2013-09-01 | 0
2013-08-01 | 0
2013-07-01 | 0 v4.0.0
2013-06-01 | 1
2013-05-01 | 846
2013-04-01 | 13009
2013-03-01 | 1147
2013-02-01 | 13981
2013-01-01 | 0
2012-12-01 | 0
2012-11-01 | 17266
2012-10-01 | 165
2012-09-01 | 251
2012-08-01 | 220
2012-07-01 | 15017
2012-06-01 | 67
2012-05-01 | 0
2012-04-01 | 22
2012-03-01 | 39
2012-02-01 | 142
2012-01-01 | 42
2011-12-01 | 79
2011-11-01 | 43
2011-10-01 | 0 v3.5.0
2011-09-01 | 0
2011-08-01 | 2205
2011-07-01 | 1770
2011-06-01 | 1025
2011-05-01 | 1514
2011-04-01 | 1605
2011-03-01 | 9372
2011-02-01 | 3162
2011-01-01 | 0 v3.4.0
2010-12-01 | 0
2010-11-01 | 0
2010-10-01 | 450
2010-09-01 | 1616
2010-08-01 | 2273
2010-07-01 | 3026
2010-06-01 | 1823
2010-05-01 | 2211
2010-04-01 | 1668
2010-03-01 | 2217
2010-02-01 | 2021
2010-01-01 | 1300
2009-12-01 | 2098
2009-11-01 | 688
2009-10-01 | 552
2009-09-01 | 1561
2009-08-01 | 833
2009-07-01 | 30
2009-06-01 | 0 v3.3.0
2009-05-01 | 249
2009-04-01 | 988
2009-03-01 | 3030
2009-02-01 | 360
2009-01-01 | 1396
2008-12-01 | 1955
2008-11-01 | 1176
2008-10-01 | 1022
2008-09-01 | 1013
2008-08-01 | 1548
2008-07-01 | 1728 v3.2.0
2008-06-01 | 2
2008-05-01 | 750
2008-04-01 | 1562
2008-03-01 | 1136
2008-02-01 | 1755
2008-01-01 | 575
2007-12-01 | 1585
2007-11-01 | 3011
2007-10-01 | 0
2007-09-01 | 244
2007-08-01 | 7680
2007-07-01 | 1771
2007-06-01 | 297
2007-05-01 | 12
2007-04-01 | 0
2007-03-01 | 0
2007-02-01 | 0 v3.1.0
2007-01-01 | 4236
2006-12-01 | 2487
2006-11-01 | 5934
2006-10-01 | 0
2006-09-01 | 0
2006-08-01 | 188
2006-07-01 | 0
2006-06-01 | 0
2006-05-01 | 0 v3.0.0
2006-04-01 | 16568
2006-03-01 | 0
2006-02-01 | 0
2006-01-01 | 0
Actually don't worry too much about this. Based on the v4,0 milestone
e.g.
The CATH Update system may flush/cluster itself before 10th May so we may be able to pull this schedule forward a little.
I'm calling the assignment_cutoff_date
of v4.0
2015-05-10
.
Any dates before that might be handy, but far from essential.
I'm calling the
assignment_cutoff_date
ofv4.0
2015-05-10
.
Do you mean 2013? It looks like there were a few more assignments after that, eg 47 in the week of 2013-05-27. Maybe due to running behind schedule? Or late fixes?
Anyway, I'd recommend the dates in my previous comment.
Hmmm, currently that's giving me the following table...
INSERT INTO release
( cath_version, pdb_snapshot_date, assignment_cutoff_date, release_date ) VALUES
( 'v2.6.0', NULL, NULL, '2005-03-16' ),
( 'v3.0.0', '2006-05-05', '2006-05-01', '2006-05-15' ),
( 'v3.1.0', '2007-01-14', '2007-02-01', '2007-01-14' ),
( 'v3.2.0', '2008-07-14', '2008-07-01', '2008-07-14' ),
( 'v3.3.0', '2009-07-07', '2009-06-01', '2009-07-07' ),
( 'v3.4.0', '2010-11-13', '2011-01-01', '2011-01-01' ),
( 'v3.5.0', '2011-09-20', '2011-10-01', '2011-10-14' ),
( 'v4.0.0', '2013-03-26', '2015-05-10', '2013-07-01' ),
( 'v4.1.0', '2015-01-01', '2015-04-01', '2016-07-01' )
;
I think some of the old "PDB release dates" may actually have got mixed up with web site release date. I'm not going to spend too much more time sorting old dates for the archive - more important that we use this moving forward.
The only entry I'll need to adjust is:
( 'v3.1.0', '2007-01-14', '2007-02-01', '2007-01-14' ),
to:
( 'v3.1.0', '2007-01-14', '2007-01-04', '2007-01-14' ),
??
I've updated the SQL to use this table and checked that the results match exactly with the SQL in svn source.
SQL could probably be neater, but it works.
http://trac.cathdb.info/projects/cath/changeset/18617
Closing the ticket - thanks @tonyelewis
Cracking.
Are the last two dates (in different years) for v4.0.0 out of order?
Anyway, great work.
Edit: correction v4.1.0 -> v4.0.0
No, just wrong - but already fixed. Thanks
On Fri, 2 Dec 2016, 15:48 Tony E Lewis, notifications@github.com wrote:
Cracking.
Are the last two dates (in different years) for v4.1.0 out of order?
Anyway, great work.
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/UCLOrengoGroup/cath-todo/issues/16#issuecomment-264485416, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJVeld0kAfWzJ2SrZLkoNaTNiaVJQz6ks5rED2wgaJpZM4K__UC .
The latest release files mention "v4_0_0" (need to investigated why it's not "v4_1_0")
http://download.cathdb.info/cath/releases/daily-release/newest/