Open kltm opened 5 months ago
Noting that we have a week-long holding pen for snapshot
s already built in for debugging, during the "Publish" step. If I switch over to having these autoclean by bucket policy, these would give us a clean jump-off point to perform the manual publication that we're already doing because of the Zenodo instability. This holding pen could be arbitrarily extended up from a week to however long we want.
While this very much falls short of a full after-the-fact "blessing" system, it is actually very in line with current practices and I believe that with the change of a few lines of the current manual release SOP, we could bring up a successful snapshot.
@pgaudet What are the minimum indicators you need before knowing if a snapshot
is worthwhile? Would you be able to look at the stats and, if it looks okay, let me know and I could put it out on the experimental AmiGO so you could take a closer look? How would letting you know work? Could I just sign you up for all success snapshot
run emails and you get back to me when the timing feels right? If this kind of thing might work for you, I think I have a fairly quick way forward:
snapshot
holding pen (i.e. go-data-products-daily
), so that only intended files are keptrelease
codesnapshots
passingrelease
amigo-exp deployment, create manual SOP that aims at specified daily bucket7-day existence rule added; we should see results very soon.
The dailies now auto-clean. Moving forward, we can use these as a clean base, within a week, to create a release.
@kltm
What are the minimum indicators you need before knowing if a snapshot is worthwhile? Would you be able to look at the stats and, if it looks okay, let me know and I could put it out on the experimental AmiGO so you could take a closer look? How would letting you know work? Could I just sign you up for all success snapshot run emails and you get back to me when the timing feels right?
The same procedure as we have now for the release seems appropriate:
Does that answer all the questions?
Thanks, Pascale
Talking to @pgaudet this morning, until we've run through this a couple of times to work out the kinks (or have a machine that gets us back to where we were), we'll:
Okay, after a little consideration, I think I may have some "easy" ways forward, although any one might take a day or so to put together. Essentially, the issue is with a bad docker/jenkins interaction. I can now see a few ways to bypass this:
Actually, poking around in this, I think I'm going to try something else first:
Also, clarifying for "3", to make this work, the whole image would have to be dropped and stood back up. If going that way, there will be some temporary repetition and we may have to introduce a template functions to bypass the string limit we will almost immediately smack into.
Looking at the failure messages, and understanding how this is happening at a stage level (not a step level), I think I can change tack a little.
I've created a new pipeline snapshot-post-fail
; it has the following properties
I believe what this should allow me to do is "hijack" the snapshot
run with the new pipeline, picking up where the failed (but data-wise sound) run terminated.
Cheers to @dustine32 for helping me out with a code review. Issues that I'll fix before proceeding:
snapshot-post-fail
@pgaudet I believe a snapshot
has now gone through, using the modified pipeline. Would you be able to briefly review it? If it seems solid, we can either 1) attempt to do the new "promotion" procedure, where we try and take a snapshot
and make it a release
or 2) do the same thing we did here for release
, giving us a very very high probability of success.
Noting that I'm now working towards something between the two above.
Essentially, I will be taking the release
pipeline, removing the first part of it, and replacing it with a "copy from snapshot". We can refine this model and timing, but a huge improvement over what we have now (nothing).
(@dustine32 I'll be hunting after you in the next day or so for a review of that change and as a sanity check.)
Look at blessing snapshots to release, to:
snapshot
orrelease
not going wellNo new libraries or technologies. The only "interesting" additions would likely be: