Open aledsage opened 11 years ago
Suggest we do something like this:
triesRemaining=3
while [ $triesRemaining -gt 0 ]; do
curl -L --retry 4 --continue-at - -o ~/MarkLogic--7.0-20130513.x86_64.rpm -L -O --user theusername:thepassword http://www.marklogic.com/download/MarkLogic--7.0-20130513.x86_64.rpm
result=$?
if [ $result -eq 0 ]; then
triesRemaining=0
else
triesRemaining=$(( $triesRemaining - 1 ))
echo "Error downloading $fileName ($triesRemaining attempts remaining)"
sleep 10
fi
done
Note the --retry 4
for transient errors, the --continue-at -
so that if the while loop tries again it picks up where it left off, and the while
loop that will try the entire command 3 times.
For a general brooklyn solution, perhaps we want to only retry (and do sleep 10
) on specific error codes such as 7 (couldn't connect to host).
something like that makes sense
alternatively wdyt about doing it from brooklyn java? using the new tasks stuff we could have a repeater task factory:
curl = "...";
repeater(ssh(curl)).until(Predicates.equals(0)).
every(Duration.ONE_SECOND).timeout(Duration.FIVE_MINUTES).
queue();
(where queue does the submission logic)
In the MarkLogic case, the curl command is in the middle of a bigger script so java approach doesn't really apply.
But for where it's a single command, then yes makes sense. And I prefer writing loops/repeater in java than in bash...
Important to have the script version as well. Would be nice to build up a library of script methods that we upload to a given machine and can be called so we can do more in scripts. But that's another topic.
leveraging @rerun would be cool - https://github.com/rerun/rerun/wiki/Tutorial . also see Donnie Berkholz's blog - http://dberkholz.com/2011/04/07/bash-shell-scripting-libraries/ .
When installing the MarkLogic entity, aws-ec2:us-east-1 has been misbehaving itself this afternoon.
It's been giving the error:
curl: (7) couldn't connect to host
for a command that has previously worked reliably, which has worked on a subset of the VMs being started concurrently, and which worked when I ssh'ed in to try the command manually.The command is:
We need to write our entities to be more resilient to this kind of transient error.