Open jbrockopp opened 5 years ago
IMHO it's a really interesting feature, other opinions?
we might be able to simplify by using context
and WithTimeout
like this:
http://ixday.github.io/post/golang-cancel-copy/
-numBytes, err := io.Copy(dst, object)
+err := Copy(ctx, dst, object)
I am not sure if the code in the referenced blog post is any good. But I like the approach of abstracting this to a separate function that resembles io.Copy
but accepts and uses context.
@bradrydzewski I agree that would be a much simpler implementation and I'm happy to try and submit something to that nature.
My only concern is, what if the plugin "hangs" on a different part of the function? We'd have to be very clear on the implementation that the timeout flag for the plugin was only a timeout for how long to wait for the copy function.
To be specific, what if the minio GetObject
function hangs inside the Get
function?
Sure, we could swap to using the GetObjectWithContext
function but then we deal with having the context be applied to both the Copy
function (your pseudo code provided above) and the GetObjectWithContext
function.
This could have unintended consequences if both the GetObjectWithContext
and Copy
function don't exceed the timeout, but combined together they could exceed the timeout.
Do you have thoughts on that?
I seem to be having a similar issue, when the AWS IAM role available to drone does not let him create a bucket : The logs indicate that
1 | time="2019-06-25T10:34:54Z" level=info msg="No path specified. Creating default"
-- | --
2 | time="2019-06-25T10:34:54Z" level=info msg="No fallback_path specified. Creating default"
3 | time="2019-06-25T10:34:54Z" level=info msg="No flush_path specified. Creating default"
4 | time="2019-06-25T10:34:54Z" level=info msg="No filename specified. Creating default"
5 | time="2019-06-25T10:34:54Z" level=info msg="Rebuilding cache at /omelois/drone-test/master/archive.tar"
6 | time="2019-06-25T10:34:54Z" level=info msg="Rebuilding cache at [pipo] to /omelois/drone-test/master/archive.tar"
7 | time="2019-06-25T10:34:54Z" level=info msg="Uploading to bucket omelois at drone-test/master/archive.tar"
kind: pipeline
name: default
steps:
- name: restore
image: plugins/s3-cache
settings:
pull: true
restore: true
- name: build
image: bash
commands:
- mkdir pipo
- echo "hello" > pipo/hello.txt
- name: rebuild
image: plugins/s3-cache
settings:
pull: true
rebuild: true
mount:
- pipo
when:
event: push
Our IAM role is enabled - other s3 actions are being performaned from the same kubernetes cluster that our drone install runs in - and I think we are experiancing the same behavior as @Baccata describes.
time="2020-05-21T18:46:44Z" level=info msg="No path specified. Creating default"
--
2 | time="2020-05-21T18:46:44Z" level=info msg="No fallback_path specified. Creating default"
3 | time="2020-05-21T18:46:44Z" level=info msg="No flush_path specified. Creating default"
4 | time="2020-05-21T18:46:44Z" level=info msg="No filename specified. Creating default"
5 | time="2020-05-21T18:46:44Z" level=info msg="Rebuilding cache at /obfuscated-drone-s3-mvn-cache/~obfuscated/obfuscated-java/master/archive.tar"
6 | time="2020-05-21T18:46:44Z" level=info msg="Rebuilding cache at [.m2/repository] to /obfuscated-drone-s3-mvn-cache/~obfuscated/obfuscated-java/master/archive.tar"
7 | time="2020-05-21T18:46:44Z" level=info msg="Uploading to bucket obfuscated-drone-s3-mvn-cache at ~obfuscated/obfuscated-java/master/archive.tar"
update - this was solved by entering the endpoint for our region of aws. still struggling with how to make it "backup" /root/.m2/repository and restore it before the mvn build though.
Currently, if we experience issues with our S3 cache service provider, it can cause the plugin to "hang". This causes the build to be stuck in a "running" state until the build reaches the timeout (usually 1 hour).
A single occurrence of the issue is not a problem, but when it happens across the entire enterprise, it causes the number of pending builds in the queue to grow massively because our agent capacity is consumed by all the "stuck" cache builds.
The logs produced in the step are:
This leads me to believe it's getting hung on line #93 because we never see the log message from line #99.
https://github.com/drone-plugins/drone-s3-cache/blob/327bde5b538acb78a6a595e27bffb5a5ed76cfce/storage/s3/s3.go#L91-L101
To remediate this issue, we decided to create our own timeout feature that we embedded into the plugin and drone/drone-cache-lib.
Would this be a feature the Drone admins were willing to accept into the plugin? I've attached the changesets below so you can see how we went about resolving this issue.
Repo Changes:
https://github.com/drone/drone-cache-lib/compare/master...jbrockopp:master
https://github.com/drone-plugins/drone-s3-cache/compare/master...jbrockopp:master