dd010101 / vyos-jenkins

Scripts for building custom VyOS stream (1.5 circinus) packages/images. Also legacy scripts for building frozen 1.3 equuleus/1.4 sagitta packages/images.
98 stars 32 forks source link

Implementing installs and build scripts #27

Closed GurliGebis closed 5 months ago

GurliGebis commented 5 months ago

This Pull Request is meant as an easy way to review and correct issues with the install scripts.

dd010101 commented 5 months ago

The 36074419fb0cfe22b4c59846aa2c9c8335ad4afc says "Cleanup of unused files" but these files are used by the readme.md (manual guide) and this should stay. I would keep all those files since all of them still have some value. I see the current version as one option and this new option as second option - I don't see like one replaces the other. We can says most people will use the scripted method but that doesn't mean some people don't what to easily understand how everything works - and that's what the current version is for.

I can understand that current directory structure doesn't make sense, so we can rearrange the structure to make more sense. I would put all manual way resources into ./manual/ and all resources used only by scripted method to ./auto/ for example. The unused or extra example scripts into ./extra/. The only overlap I see is the jobs.json - it makes sense to adjust seed-jobs.sh to use the files from your install scripts and shared resources can be in their own directory. Basically keep all files just move then around to resolve the mess. You can just keep all files where they are as well and I will rearrange them after merge.

GurliGebis commented 5 months ago

I'll change it, so they are moved instead. I removed them, since I kept editing them instead of my own files (damn tab completion). I think we should recommend the installer way as default, since the manual path is quiet difficult (I consider myself pretty well versed with this kind of stuff, and my head started spinning a few times during all this (see my other tickets of me stumbling 😃)) I hope to get more time tomorrow for working on the error handling for the rest of the scripts. Looking at 1-4, does it looks better now? (I have tested it locally on a clean VM, but I did that before as well, so still "works on my machine")

dd010101 commented 5 months ago

I think we should recommend the installer way as default, since the manual path is quiet difficult (I consider myself pretty well versed with this kind of stuff, and my head started spinning a few times during all this (see my other tickets of me stumbling 😃))

I plan to update the readme so the first thing you met is the scripted method and only if you want, then you can follow the manual method. I see the scripted method as preferred.

Well with the instructions it seems easy to me - like if you imagine working without them then it's more difficult, right? The scripted method does save time and reduces mistakes/typos so that's why I would call it preferred.

Looking at 1-4, does it looks better now?

For sure 👍

Just few more ideas...

The > /dev/null 2>&1 thing

The error handling is much better in sense that script will stop on error and will not say "done"/execute more steps but there is still a lot of stderr suppression - this hides error messages. I would remove the 2>&1 everywhere. There are valid use-cases like if there is expected error message, then we could filter it out via grep so nothing will be printed if only expected error occurs. This would improve the ability to debug future issues by a lot.

Like this:

iDontExist > /dev/null 2> >(grep -v "command not found" >&2)

Where iDontExist returns -bash: iDontExist: command not found but that's fine, so we filter the stderr via grep, remove the line with command not found and pipe the rest back to stderr. This way the stdout is suppressed, the expected stderr is suppressed but any other stderr will be visible for debugging purposes.

You can make a function for this (quotes are important):

function filterStdErr {
    eval "$1" > /dev/null 2> >(grep -v "$2" >&2)
}

filterStdErr "iDontExist" "command not found"

Although most commands should be fine with > /dev/null alone. If the expected error output is hard to filter by simple grep, they it's justified to use full suppression (> /dev/null 2>&1) - this should applied to rare cases - not as default.

Shared functions

I would move all functions to single script and then you can use source ./auto/functions.sh to import them. So there is no repeating functions for each script. Like the functions for Jenkins for example.

Where am I? Where I should go?

Could we add current script name to the header?

echo "Currently executing $0..."

Could we update the ending?

echo "Part 1 ($0) of the installer is now done."
echo "Please run part two (2-jenkins.sh) to set up Jenkins."

Extra spaces?

echo -e "$OK_INDICATOR Installed $key.                 "

Those extra spaces at the end are intentional?

Quotes

Bash has tendencies to create all sorts of issues if we don't use quotes/escaping properly.

For example:

java -jar jenkins-cli.jar -s http://172.17.17.17:8080 -auth $USERNAME:$TOKEN

Should be:

java -jar jenkins-cli.jar -s http://172.17.17.17:8080 -auth "$USERNAME:$TOKEN"

Otherwise bash could interpret the variables in wrong way. This applies to every variable - if the variable is string, then it should have double quotes to make sure it stays a string. Without proper quotes/esacping the bash may do unexpected things at best or bad things at worst and it does make debugging harder for sure.

GurliGebis commented 5 months ago

Thank you - I'll look through your comments tomorrow (hopefully) 🙂

Regarding error supressing, I'm thinking about storing the result of the commands in a variable instead, and if the command fails, I write "error", and print the value of the variable.

That way, the error message won't be printed inside the "list" updates.

Also, do you know how to load functions from other bash files? (I have never done that, so I haven't looked into that yet)

dd010101 commented 5 months ago

Regarding error supressing, I'm thinking about storing the result of the commands in a variable instead, and if the command fails, I write "error", and print the value of the variable.

That would be interesting but requires to wrap all commands in function thus not as pretty as using bash ability to stderr when needed. Also there is option that command produces stderr and exits with 0 thus it didn't fail hard but did warn and you can't simply detect this. I would use the default bash behavior, no need to invent something else.

That way, the error message won't be printed inside the "list" updates.

That's actually helpful in sense you have reference to where it did happen and most of the time this will be the end anyway since script will terminate after command error.

Also, do you know how to load functions from other bash files? (I have never done that, so I haven't looked into that yet)

I did add example, see above, you just source another bash script where functions live. This will execute everything thus it make sense to have script only for shared declaration stuff and not just import another script that does stuff.

GurliGebis commented 5 months ago

Okay. My idea was to let it go back one line, print that it has failed, and then print the command error message after that. (if possible)

GurliGebis commented 5 months ago

https://stackoverflow.com/a/962268

Moving stdout to null and stderr to stdout (which seems to be what I am doing now) should solve that. Since any string returned from a command would mean an error.

It is not the prettiest, but I still think it is the best way. Maybe something like: if return value != 0 or variable is not empty, then halt and print content of variable.

I will test it tomorrow and see if it works as expected 🙂 (I can create some dummy scripts to test both cases, where return value is 0 but stderr has content, and the inverse of that)

dd010101 commented 5 months ago

That would work but it seems like you are making it more complicated then it needs to be since you can just replace > /dev/null 2>&1 with > /dev/null and this would essentially do the same as capturing stderr into stdout into variable and then print with condition. 🤔

dd010101 commented 5 months ago

Here you have example to illustrate what I mean:

The script:

#!/usr/bin/env bash
set -e

function goodCommand {
    echo "good stdout"
    return 0
}

function peskyCommand {
    echo "pesky stdout"
    >&2 echo "Warning! Something unexpected but I recovered thus continue!"
    return 0
}

function reallyPeskyCommand {
    echo "really pesky stdout"
    >&2 echo "Warning! Something unexpected but I recovered thus continue!"
    >&2 echo "Warning! Even more is unexpected!"
    return 0
}

function badCommand {
    echo "bad stdout"
    >&2 echo "FATAL ERROR! Unrecoverable!"
    return 111
}

function filterStderr {
    # Bit of magic to filter stderr and keep original exit code
    # Filtering of stderr isn't easy! That's why we need to invoke some magic.
    # 1) drop stdout
    # 2) redirect stderr to stdout
    # 3) filter via grep
    # 4) make grep exit cleanly if nothing found
    # 5) redirect stdout to stderr
    # 6) return exit code of eval not the resulting exit code from grep
    eval "$1" 2>&1 1>/dev/null | (grep -v "$2" || true) 1>&2
    return ${PIPESTATUS[0]}
}

echo "1) Most common use-case - the goodCommand with only stdout."
goodCommand > /dev/null
echo "Result is $?"
echo

echo "2) Command will warn via stderr but that's okay, we want to see it/keep in mind but we can continue."
peskyCommand > /dev/null
echo "Result is $?"
echo

echo "3) Command we know it will stderr but we don't want to confuse user since it's expected behaviour."
filterStderr "peskyCommand" "continue"
echo "Result is $?"
echo

echo "4) Another command with both expected and unexpected stderr."
filterStderr "reallyPeskyCommand" "continue"
echo "Result is $?"
echo

echo "5) Special case when command intentionally fails - we want suppress exit code and filter known stderr."
filterStderr "badCommand || true" "FATAL"
echo "Result is $?"
echo

echo "6) At last the badCommand that fails unexpectedly."
badCommand > /dev/null
echo "Result is $?"
echo

echo "7) You shall never see this, since previous command failed..."

The output:

1) Most common use-case - the goodCommand with only stdout.
Result is 0

2) Command will warn via stderr but that's okay, we want to see it/keep in mind but we can continue.
Warning! Something unexpected but I recovered thus continue!
Result is 0

3) Command we know it will stderr but we don't want to confuse user since it's expected behaviour.
Result is 0

4) Another command with both expected and unexpected stderr.
Warning! Even more is unexpected!
Result is 0

5) Special case when command intentionally fails - we want suppress exit code and filter known stderr.
Result is 0

6) At last the badCommand that fails unexpectedly.
FATAL ERROR! Unrecoverable!

The output filtered via > /dev/null:

Warning! Something unexpected but I recovered thus continue!
Warning! Even more is unexpected!
FATAL ERROR! Unrecoverable!

Apart of the filtering of stderr we use default bash behavior and that's what you expect to happen. Your scripts are correct, the only issue is, that you drop the whole stderr without reason. Most of the time we want to simply pass stderr to be displayed (as stderr, not as stdout).

There are exceptions highlighted in the example - like known stderr we want to hide - filter if you can, try to never drop the whole thing. Also there is case where command intentionally fails - then pipe it with true to override to 0 when non-zero occurs otherwise set -e would halt.

I don't think there is reason to complicate this - the default bash behavior is good and expected.

GurliGebis commented 5 months ago

My idea is to print stderr when I want to the screen, not at the exact moment it is written. (I want to update the "UI" first, and then print the error) 🙂 Things have come up, so might not be before tomorrow I can work on it

dd010101 commented 5 months ago

My idea is to print stderr when I want to the screen, not at the exact moment it is written.

You want some sort of error dialog/screen/section?

Making UI isn't necessarily better since that will disconnect the error from where it did happen and also it will buffer the output - that's not ideal.

Like if there is long running process you don't see a thing until the end and that's not good if you encounter issues like timeouts where you want to know that right away and not after the command will gives up eventually after it retries multiple times.

There are also cases where you see stderr but that should not end the program (like some error-retry-success case). So if you exit after any stderr that's not correct.

I see there are reasons why the default behavior works the way it works and making it better is quite hard thing since it's easy to break it in some way and have result that's less functional than the default.

Thank you - I'll look through your comments tomorrow (hopefully)

I will test it tomorrow and see if it works as expected

Things have come up, so might not be before tomorrow I can work on it

I see you mentioning timing quite often - don't even think about it - you work on it "later" - when you have time, it will happen when it will happen, doesn't matter if that's tomorrow or next week. Don't put schedule on your free time!

GurliGebis commented 5 months ago

Not really, I'm talking about the: [ !! ] Task X failed - I want the error to be printed below that. To be able to do that, I need to move the cursor up one line, change the text to the above, and then print the error on the screen after that.

dd010101 commented 5 months ago

Oh I see there are now many ifs just for checking the exit code. That's misunderstanding and complete overkill, that's not what I meant. That's a lot of additional work and it doesn't necessarily solve the issue completely.

I meant by error handling to let the bash do it's thing, you don't need any task X failed at all, just run the commands and let fail what fails.

In this state the if for Task X failed won't even get executed, since the bash will exit before that. The stuff what I was talking about is very minor change no massive rewrite like this.

Originally there was no error handling in sense the bash would continue and say "all is done" even if the commands failed - that's a problem. But that's solved by set -e alone - you don't need anything else, just this one line.

Another issue was that the script failed for me and I didn't see why, because all stderr were eaten by the redirection to stdout and then /dev/null. This is solved by properly handling stderr - see the examples above - this also is no major rewrite - basically just remove the redirection and find and handle the couple commands that fail intentionally.

Overall these are minor changes, they affect everything and thus everything needs to be validated but you don't need to rewrite the whole thing at all.

I didn't notice this at first since I don't see what changed since there are not commits / previous version to compare to, sorry for that and sorry if I wasn't explaining myself more clearly.

dd010101 commented 5 months ago

The idea is - you don't know what can fail - thus you can't make if for it, because anything can fail at any point, including the if that checks if something failed...

That's why you let the bash handle it - how does bash handle it?

If you use set -e then bash checks if the exit code of any executed statement is non-zero and then stops execution right there. Thus any other statement or command isn't executed - that's good because if something fails we have unknown state and thus the only safe thing is to stop execution - this will prevent the "all is done" echo or execution of any following code (including the if for exit code).

Also there is the concept of stdout and stderr for this purpose. We have two streams because we use one stream for the casual stuff and the other for the important stuff and thus if you don't merge these together and drop them, then you can leverage this mechanism to do it's thing and report unexpected errors by itself, you don't to write additional code, you let the stderr to do it's thing.

You were not activating the correct features (set -e) and preventing other features (stderr) from working and that was the only issue. The error handling is already implemented in the bash, it was just basically disabled.


That being said - if you really want to go this overly complicated route - where you want your own error handling, because you like the looks then I would avoid so many duplicated IF statements - this would make the code much more readable. Thus everything surrounding error handling should be handled by single function that does this logic instead, so there is one IF statement for exit code not countless of them.

There is also issue that you can't use set -e since that disables your own error handling - I don't like to run script without set -e if you expect reliability since then there is much higher possibility that script will fail silently - script still can fail outside your own error handling or the error handling itself can fail. This goes also into the argument that the more code you write, the more complicated it gets, then it's more likely to have errors, thus simple solutions are better even if they aren't as fancy.

GurliGebis commented 5 months ago

I have tried that - the thing is, I don't want to just have every command just spew out whatever error they have and then fail. While it makes it easier to debug, but writing a user friendly error message would be better.

That is why I'm adding error handling to everything, to print out a message that tells the user where things to wrong.

Lets take Jenkins as an example - it dumps a java stacktrace if it fails - that would mean nothing to 99% of users, and would result in tickets. Capturing the error, and then telling the user that their username or token is invalid is way better.

I'm working on refactoring stuff to use functions - the result seems to be much cleaner. I'll let you know when I'm done, so we can do a new clean test. 🙂

dd010101 commented 5 months ago

The errors are unexpected and they should never happen. Thus making them user-friendly is wasted effort. Printing formatted errors isn't necessarily better since the brokenness of the output can give you extra information of what exactly happened, specifically when.

I have experience where most people ignore errors anyway and just write issue instead. That's why I don't think the errors are useful for most users, they are mostly for us to give people support, with exceptions of course. The errors are cryptic anyway, take this example - it doesn't matter how nicely you format this - most users will not understand the cryptic message anyhow.

Capturing the error, and then telling the user that their username or token is invalid is way better.

That's bad example. You need to test the credentials purposefully, where you have the translation to meaningful message for specific stack trace. That has nothing to do with other commands. Do you think you can translate all possible outputs for all possibles commands? That's not possible, you will print totally cryptic messages and long stack traces that most people can't understand, you will just format it in different way...

Generally it's good idea to have some kind of user-friendly UI - but it doesn't make sense with bash since you are spending massive effort on something that will never be used and makes the debugging experience worse if the unexpected situation happens. If you want to format the output, you would need to buffer it and that's just bad - there are instances where stderr will tell you something is wrong long before the command finally exists and that's a moment when you CTRL+C - and there is no way to solve this issue since you need to buffer stderr if you want to move it somewhere else and this can really break the experience. There is no way to make this properly and thus if you want to make one aspect more user-friendly you will by design break other aspects.

GurliGebis commented 5 months ago

But how would you go about explaining to the user what they should do to fix the issue? If you just exit with the stderr message from the command, you would leave the user with an error message and no way to know what they can do to rectify the issue.

dd010101 commented 5 months ago

I can't give users useful messages since there are infinite number of possible errors. Thus the best thing I can give them is to just spit-out whatever the underlying command gives me. This may not help them at all but that's the best I can do. Your solution can't change anything about this - you will you take the same contents and just move it somewhere else, you can't give meaningful translations with suggested solution for all possible errors.

The Jenkins credentials example is misleading since you should have that regardless of what error handling you will use - that's known and common error - that you can and should translate - but how you want to translate all possible errors of all commands? Simply not possible...

How would you want to solve this issue? The fact is, that the underlying commands most of the time don't give meaningful messages with solutions and I can't translate them all or even most or even significant number of them.

That's why I don't see the purpose of making this translation for everything since most of it have no translation and you spit-out the same thing anyway - it will require additional effort and it will introduce buffering for everything - that's reason why you don't want to do it by default - you want to do it only for the few commands you know.

GurliGebis commented 5 months ago

Okay.

I have pushed my latest changes - could you try and make a pull request with just one change for this, so I can see exactly what you mean? (just change it one place).

I think I'm misunderstanding you - so it would be easier if you can show me 🙂 I have btw. refactored as much as possible into helper functions, to avoid as much redundancy as possible.

dd010101 commented 5 months ago

I don't have specific implementation in mind - I just see major issue that needs solving and I can give you ideas and if you like them I can show you implementation - but if I don't know what ideas you like to do, they I don't know the implementation. First step is to understand the issue or the goals and we still in this phase I feel.

Let's start with error handling hopefully now you will understand

  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null 2>&1

  if [ $? -ne 0 ]; then
    ClearPreviousLine
    PrintErrorIndicator "Failed to download Jenkins sources.list.d files."
  fi

This will only say Failed to download Jenkins sources.list.d files. and that's not enough. Because it doesn't say why it did failed. This was the original issue, the new version doesn't solve the issue, it just adds sugar around the original issue.

This may not seem like big deal but this will create support hell, since if you don't tell people why something fails, they will come to you and ask you - and how you can help them in this case? It only says something failed - nobody knows what exactly or why. What would you say as response to such situations?

I can only think of one response "please rewrite our scripts - so we can see why" - is that more user-friendly from your point of view? I would say that spilling out stderr and breaking formatting is much more user-friendly than needing to seek assistance for every error that will tell you to rewrite the scripts to spill the stderr...

Noise

This structure is repeating a lot:

  if [ $? -ne 0 ]; then
    ClearPreviousLine
    PrintErrorIndicator "Failed to download Jenkins sources.list.d files."
  fi

I would just delete it but if you want it please use function to handle this whole block not only some elements within:

At least convert this:

  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null 2>&1

  if [ $? -ne 0 ]; then
    ClearPreviousLine
    PrintErrorIndicator "Failed to download Jenkins sources.list.d files."
  fi

Into this:

run "wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null" \
  "Failed to download Jenkins sources.list.d files."

And this:

  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null

  if [ $? -eq 0 ]; then
    ClearPreviousLine
    PrintOkIndicator "Jenkins sources.list.d files has been set up."
  else
    ClearPreviousLine
    PrintErrorIndicator "Failed to set up Jenkins sources.list.d files."
  fi

Into this:

run "echo \"deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]\" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null" \
  "Failed to set up Jenkins sources.list.d files." \
  "Jenkins sources.list.d files has been set up."

Maybe even into this:

function fillJenkinsSourcesList {
   echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null
}
run fillJenkinsSourcesList  \
  "Failed to set up Jenkins sources.list.d files." \
  "Jenkins sources.list.d files has been set up."

This approach would allow you to just copy paste the commands without the need to escape quotes, it also would be much cleaner for multi-line commands then to try to include the same thing as string argument - like the first example - or combine both of them - the first for one-line, the second for multi-line. Either way there is no reason to duplicate this block countless times:

  if [ $? -eq 0 ]; then
    ClearPreviousLine
    PrintOkIndicator "Jenkins sources.list.d files has been set up."
  else
    ClearPreviousLine
    PrintErrorIndicator "Failed to set up Jenkins sources.list.d files."
  fi

Checking for exit code of echo/tee seems like insanity to me... You include error handling for code that is exceedingly unlikely to fail, because you enforce to run as root - how likely is to see error of echo/tee when running as root?

This leads to another idea - let's say you really like to segment the code into steps and say step done/ok or step failed. You can have that in way that doesn't check every single echo - you can make more generic group of commands so you don't need to check every command:

  # Download keyring
  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null 2>&1

  if [ $? -ne 0 ]; then
    ClearPreviousLine
    PrintErrorIndicator "Failed to download Jenkins sources.list.d files."
  fi

  # Create Jenkins sources.list.d file
  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null

  if [ $? -eq 0 ]; then
    ClearPreviousLine
    PrintOkIndicator "Jenkins sources.list.d files has been set up."
  else
    ClearPreviousLine
    PrintErrorIndicator "Failed to set up Jenkins sources.list.d files."
  fi

Into:

function setupJenkinsApt {
   wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
   echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] https://pkg.jenkins.io/debian-stable binary/" > /etc/apt/sources.list.d/jenkins.list
}
run setupJenkinsApt \
  "Failed to set up Jenkins APT" \
  "Jenkins APT has been set up."

Isn't that much nicer? I don't know what format you like but I can give you ideas how to run function can work. The basic idea is to pick first argument - like string or function, execute it, check the exit code and use second or third argument (if supplied) to echo the ok/fail message. If you want I can also provide implementation.

BTW: Please don't use misleading names 😄 PrintErrorIndicator should be PrintErrorIndicatorAndTerminate or something - also if you exit, please exit 0 or exit 1 so we know if the script failed or not. Also please echo error message into stderr: >&2 echo "FATAL ERROR! Unrecoverable!".

BTW2: It would be amazing if you could temporarily make individual commits and only squash them when it's done, since this way I don't see the development/changes - I see the only final result.

dd010101 commented 5 months ago

This works quite well:

#!/usr/bin/env bash

# Define colors.
RED='\033[0;31m'
GREEN='\033[0;32m'
LIGHTBLUE='\033[0;94m'
NOCOLOR='\033[0m'

# Define color indicators.
EMPTY_INDICATOR="[    ]"
OK_INDICATOR="[ ${GREEN}OK${NOCOLOR} ]"
ERROR_INDICATOR="[ ${RED}!!${NOCOLOR} ]"

#
# Rules:
# You can pass one-line commands:
#
#   run "wget ... && rm ..." \
#      "Start message" \
#      "Error message" \
#      "Success message"
#   run "wget ... | grep ..." \
#      "" \
#      "Just error message"
#
# There can be any combination of start/error/success/none message
#
# You can pass any bash, including function call:
#
#   run "myFunction"
#
# You shall never drop the stdout or stderr (> /dev/null),
# don't use redirection 2>&1 either. The stdout is dropped
# by default, stderr shall be shown.
#
# If you want to silent specific stderr lines, then filter it,
# don't drop the whole thing.
#
# The passed bash/function is executed with 'set -e', thus
# the command chain will terminate on first failed command.
# That's why you can safely include multiple command calls.
#
function run {
    command="$1"
    infoMessage="$2"
    errorMessage="$3"
    successMessage="$4"

    if [ "$infoMessage" != "" ]; then
        echo -e "$EMPTY_INDICATOR $infoMessage"
    fi

    output=$( (set -e; eval "$command") 2>&1 1>/dev/null | tee /dev/fd/2; exit ${PIPESTATUS[0]} )
    exitCode=$?

    if [ $exitCode -eq 0 ]; then
        if [ "$successMessage" != "" ]; then
            if [ "$output" == "" ]; then
                tput cuu1; tput el
            fi
            echo -e "$OK_INDICATOR $successMessage"
        fi
    else
        if [ "$output" == "" ]; then
            tput cuu1; tput el
        fi
        if [ "$successMessage" != "" ]; then
            >&2 echo -e "$ERROR_INDICATOR $errorMessage"
        else
            >&2 echo -e "$ERROR_INDICATOR Unexpected failure, exit code: $exitCode"
        fi
        exit $exitCode
    fi
}

echo
run "sleep 1; echo test" \
    "1) test of success" \
    "Fail message" \
    "Success message"

# wrap in sub-shell so test 2) doesn't terminate yet
echo
(run "sleep 1; what?" \
    "2) test of error" \
    "Fail message" \
    "Success message")
echo "Expected exit code: $? == 127"

echo
run ">&2 echo You shall see two lines here; sleep 1; >&2 echo The second line" \
    "3) test of stderr with clean exit" \
    "Fail message" \
    "Success message"

# test 4) shall terminate
echo
function testBlock {
    >&2 echo "This block will fail at second line and you shall not see third line"
    sleep 1
    who?
    >&2 echo "You shall never see this"
}
run "testBlock" \
    "4) test of error in block" \
    "Fail message" \
    "Success message"

echo "You shall never see this either"

The output:

[ OK ] Success message

[    ] 2) test of error
test.sh: line 53: what?: command not found
[ !! ] Fail message
Expected exit code: 127 == 127

[    ] 3) test of stderr with clean exit
You shall see two lines here
The second line
[ OK ] Success message

[    ] 4) test of error in block
This block will fail at second line and you shall not see third line
test.sh: line 100: who?: command not found
[ !! ] Fail message

Both the stderr and exit code are handled properly. You don't clear previous line if there is previous stderr and that's crucial information to keep, that's why we clean only with no previous output - not as fancy but much more functional. This solves all the issues above: 1) stderr propagation so we see why something failed immediately as it happens 2) reducing the noise of repeating conditions 3) the correct propagation of error messages to stderr and correct exit code 4) automatic termination following first failed statement.

With this we can reduce this:

  PrintEmptyIndicator "Setting up Jenkins sources.list.d files..."

  # Download keyring
  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null 2>&1

  if [ $? -ne 0 ]; then
    ClearPreviousLine
    PrintErrorIndicator "Failed to download Jenkins sources.list.d files."
  fi

  # Create Jenkins sources.list.d file
  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null

  if [ $? -eq 0 ]; then
    ClearPreviousLine
    PrintOkIndicator "Jenkins sources.list.d files has been set up."
  else
    ClearPreviousLine
    PrintErrorIndicator "Failed to set up Jenkins sources.list.d files."

Into this:

function setupJenkinsApt {
  # Download key
  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
  # Create Jenkins sources.list.d file
  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] https://pkg.jenkins.io/debian-stable binary/" > /etc/apt/sources.list.d/jenkins.list
}
run setupJenkinsApt \
  "Setting up Jenkins APT..." \
  "Failed to set up Jenkins APT." \
  "Jenkins APT has been set up."

This should satisfy both mine and your ideas. You can have fancy error handling and steps as you do now, but you should reduce the segmentation into blocks, since we don't need each echo to have it's own call, we can merge into logic blocks so there is not as much noise. We also have stderr, this will allow us to see why something fails - not only if it fails yes/no. This also correctly passes stderr/exit code and automatically terminates on first command failure.


Conditional clearing of previous line there wasn't any previous output from the command without buffering wasn't so easy! But now it will clear for error/success only if previous line was the start message. I did need to invoke some serious juggling magic but it seem to work quite well including the exit code propagation.

output=$( (set -e; eval "$command") 2>&1 1>/dev/null | tee /dev/fd/2; exit ${PIPESTATUS[0]} )

First we drop stdout. Then we redirect strerr into stdout so we can tee. Then we eval with set -e. Then we tee, tee sees stderr that goes trough stdout. Then tee copies one copy into stdout, this gets captured into $output variable. Then tee copies second copy into actual stderr (second file descriptor), this will be sent into the terminal. Then we need to return the first exit code from the pipe (of the eval), otherwise we will get exit code from tee.

This sequence basically calls eval and sends stderr into both variable and terminal without buffering and also preserves exit code. Not as easy to do as it sounds. That's what you need to do if you want your own error handling with fancy steps. You can't just store stderr into variable and print it later - that would break the stderr flow so this is the best compromise of your idea without breaking the bash flow.


Do you see some problem with this approach? Do you think the output formatting is acceptable?

GurliGebis commented 5 months ago

I think I got the idea now 😊 I think we can keep it really simple - I'll see if I can get some time to work on it today. Thanks

GurliGebis commented 5 months ago

I just tried it - with this code:

function SetupJenkinsSourceListD {
  wget -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key > /dev/null
  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" https://pkg.jenkins.io/debian-stable binary/ | tee /etc/apt/sources.list.d/jenkins.list > /dev/null
}

# Set up sources.list.d files for Jenkins
if [ -f /etc/apt/sources.list.d/jenkins.list ]; then
  PrintOkIndicator "Jenkins sources.list.d has already been set up."
else
  Run "SetupJenkinsSourceListD" \
    "Setting up Jenkins sources.list.d files..." \
    "Failed to download or set up Jenkins sources.list.d files." \
    "Jenkins sources.list.d files has been set up."
fi

I get the following output:

[    ] Setting up Jenkins sources.list.d files...
--2024-06-21 10:58:00--  https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
Resolving pkg.jenkins.io (pkg.jenkins.io)... 151.101.130.133, 151.101.66.133, 151.101.2.133, ...
Connecting to pkg.jenkins.io (pkg.jenkins.io)|151.101.130.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3175 (3.1K) [application/pgp-keys]
Saving to: ‘/usr/share/keyrings/jenkins-keyring.asc’

     0K ...                                                   100% 20.5M=0s

2024-06-21 10:58:00 (20.5 MB/s) - ‘/usr/share/keyrings/jenkins-keyring.asc’ saved [3175/3175]

[ OK ] Jenkins sources.list.d files has been set up.

So for some reason, the output of wget is printed to screen - and ideas? (it should be totally silent, since no error is happening, and I'm piping it to /dev/null)

dd010101 commented 5 months ago

Use wget -q for quiet. Don't use > /dev/null inside the function, the run function handles the stdout for you. What you see isn't stdout. The wget is being smart and abuses stderr to show you progress meanwhile you can redirect stdout as the result. In such cases you need to tell the program itself to behave and use stderr only for errors.

function SetupJenkinsSourceListD {
  wget -q -O /usr/share/keyrings/jenkins-keyring.asc https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
  echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] https://pkg.jenkins.io/debian-stable binary/" > /etc/apt/sources.list.d/jenkins.list
}

# Set up sources.list.d files for Jenkins
if [ -f /etc/apt/sources.list.d/jenkins.list ]; then
  PrintOkIndicator "Jenkins sources.list.d has already been set up."
else
  Run "SetupJenkinsSourceListD" \
    "Setting up Jenkins sources.list.d files..." \
    "Failed to download or set up Jenkins sources.list.d files." \
    "Jenkins sources.list.d files has been set up."
fi
GurliGebis commented 5 months ago

That makes sense - I'll keep working on it. Hopefully nothing else acts up like this.

GurliGebis commented 5 months ago

There is a problem - if you use -q with wget, if it fails, it just shuts up

dd010101 commented 5 months ago

Yeah, wget is silly, wget -nv doesn't work either. There is no way to make wget behave. Thus wget is not suitable for use in scripts.

Let's use curl then!

curl -sS --fail-with-body -o myfile.txt https://host/path

First s disables output (--silent), second S enables error reporting (--show-error). That's what we need - only errors.

GurliGebis commented 5 months ago

yep, cURL it is - I just have to rearrange a few things 😊

dd010101 commented 5 months ago

Also include --fail-with-body then curl correctly fails when non-200 response is received. Much much better than what wget can do.

GurliGebis commented 5 months ago

Done 😊 The call to useradd has to be muted, since it will complain about 1006: useradd warning: jenkins's uid 1006 is greater than SYS_UID_MAX 999 But there really is no way that it can fail, from what I can see

dd010101 commented 5 months ago

Then please remove the --system option instead. Always try to correctly solve the unwanted output - mute should be the last option for commands that can't be make to behave and don't have alternative and even then - filter the output, don't throw it away completely. In this case the command is unlikely to fail for sure but this should be a policy that you follow always, since your expectation of what command can fail may not be always on spot.

GurliGebis commented 5 months ago

Indeed, that sounds like a better solution (the user shouldn't be system anyway)

dd010101 commented 5 months ago

I did copy the --system option from the package definition. There it's used mainly for the purpose of selecting lower UID/GID but we need IDs in user range, thus the --system option doesn't make sense.

System users will be created with no aging information in /etc/shadow, and their numeric identifiers are choosen in the SYS_UID_MIN-SYS_UID_MAX range, defined in /etc/login.defs, instead of UID_MIN-UID_MAX (and their GID counterparts for the creation of groups).

Note that useradd will not create a home directory for such an user, regardless of the default setting in /etc/login.defs (CREATE_HOME). You have to specify the -m options if you want a home directory for a system account to be created.

It does other things too but Debian by default doesn't have EXPIRE so expiration doesn't apply. Home directory also doesn't matter in this case.

GurliGebis commented 5 months ago

yep - so it looks like we are doing it correct now. I think I'll soon be done with 1-prereqs.sh (working in another branch) - I'll ping you once I'm done - can you take a look at it, if you think it looks right (before I go through the rest of them, just to redo it again later) 😊

GurliGebis commented 5 months ago

I think the first one is done now - please have a look here: https://github.com/GurliGebis/vyos-jenkins/blob/cleanup/1-prereqs.sh

GurliGebis commented 5 months ago

It seems like the error message isn't being printed - just tried making curl fail:

[ OK ] Updated apt sources.
[ OK ] cURL is already installed.
[    ] Setting up Jenkins sources.list.d files...
curl: (6) Could not resolve host: pkg.jen1kins.io

It should have changed [ ] Setting up Jenkins sources.list.d files... to [ !! ] Failed to download or set up Jenkins sources.list.d files.

dd010101 commented 5 months ago

It won't change [ ] Setting up Jenkins sources.list.d files... to [ !! ] Failed to download or set up Jenkins sources.list.d files., it should just print another line with [ !! ] Failed to download or set up Jenkins sources.list.d files..

It doesn't since you put extra set -e in helper-logic! https://github.com/GurliGebis/vyos-jenkins/blob/cleanup/helper-logic#L32 Remove it then it will work, the run function does include set -e for the code that gets executed - so if you put all logic trough run function then errors will be handled correctly.

#################################################
# Unofficial VyOS package mirror installer v1.0 #
#################################################

[ OK ] Updated apt sources.
[ OK ] cURL is already installed.
[    ] Setting up Jenkins sources.list.d files...
curl: (6) Could not resolve host: pkg.jenkinsERROR.io
[ !! ] Failed to download or set up Jenkins sources.list.d files.
GurliGebis commented 5 months ago

Ahh, I see - I'll remove that - thanks. Apart from that - does it make sense what I have changed?

dd010101 commented 5 months ago

Yes, the issue is solved, the stderr is now handled properly. Thus I don't see any issues that would prevent usage. 👍

GurliGebis commented 5 months ago

Perfect, I'll go through the rest when I get time.

GurliGebis commented 5 months ago

I will move the readme into the manual folder, and create a draft for one related to the installer, mentioning the manual option if people want 🙂

dd010101 commented 5 months ago

I want to leave as is for now and add another section on the beginning describing the scripted method.

There is also overlap - like the why would you want to do it. What OS you should use. Why you want to use dedicated VM. How to debug if something goes wrong and so on.

GurliGebis commented 5 months ago

@dd010101 I'm unable to automatically build the equuleus version of the vyos-build-container job. This is due to it asking for the ELTS_MIRROR variable value.

Can we remove it please? (Since the installer will be setting up the apt mirror, it will always be there on 172.17.17.17) If not, we will need to figure out a way to pass that info into Jenkins when calling the build api url.

dd010101 commented 5 months ago

The parameter is there for a reason - if someone has different setup, then this Job would be useless if it was hardcoded. There could be some fallback in bash instead of parameter default and then we could use the bash default when the Jenkins parameter is empty but that's not as user friendly as to see the default value in Jenkins directly - so you can use the default value as inspiration for what is expected from you.


This is how it looks in Jenkins. The variable has default value and this is automatically filled-in when you try to build, so in Jenkins there is no need to fill this, you just confirm the default.

jenkins2


So I would guess the API should look like this:

/job/PACKAGE/job/BRANCH/buildWithParameters

This should use the default? If not then fill it:

/job/PACKAGE/job/BRANCH/buildWithParameters?ELTS_MIRROR=http%3A%2F%2F172.17.17.17%3A3142%2Fdeb.freexian.com%2Fextended-lts
GurliGebis commented 5 months ago

I figured out how to fix it :)

I updated the json to have a parameters array - and then I iterate that for each branch, and then I use that trigger the builds using jenkins cli.

PoC code works, so now I just need to integrate it, which shouldn't take long 😊

GurliGebis commented 5 months ago

I'm close to being done with the changes - I'll ping you once I'm done, and merge them into this branch

GurliGebis commented 5 months ago

@dd010101 I have pushed the changes to the cleanup branch - care to take a look?

dd010101 commented 5 months ago

Looks fine in the code, but failed to run the step 5.

Step 2

Visual issue - multiple IP addresses, would be better to place each IP address on it's own line and add http:// and port/path for each IP:

Please open a web browser, and go to http://172.17.0.1
172.10.10.15:8080.

...

Once you reach the Welcome page, please go to this url: http://172.17.0.1
172.10.10.15:8080/user/root/configure

One IP is from LAN interface other IP is from docker interface, that's common case, it's possible someone will have even more.

Step 5

Unexpected failure:

#################################################
# Unofficial VyOS package mirror installer v1.0 #
#################################################

[ OK ] Connected to Jenkins successfully.

Provisioning jobs in Jenkins...
[ OK ] vyos-build-container

Waiting for jobs to be provisioned in Jenkins...

Total number of jobs: 3

[  Completed  ] Package: vyos-build-container - Branch: equuleus
[  Completed  ] Package: vyos-build-container - Branch: sagitta
[  Completed  ] Package: vyos-build-container - Branch: current

Building jobs in Jenkins...
Total number of jobs: 3
Concurrent jobs: 24 (Can be overridden by setting the CONCURRENT_JOBS_COUNT environment variable)

jq: error (at <stdin>:1): Cannot index string with string "name"
jq: error (at <stdin>:1): Cannot index string with string "value"

ERROR: A key must be set.
java -jar jenkins-cli.jar build JOB [-c] [-f] [-p] [-r N] [-s] [-v] [-w]
Starts a build, and optionally waits for a completion.
Aside from general scripting use, this command can be
used to invoke another job from within a build of one job.
With the -s option, this command changes the exit code based on
the outcome of the build (exit code 0 indicates a success)
and interrupting the command will interrupt the job.
With the -f option, this command changes the exit code based on
the outcome of the build (exit code 0 indicates a success)
however, unlike -s, interrupting the command will not interrupt
the job (exit code 125 indicates the command was interrupted).
With the -c option, a build will only run if there has been
an SCM change.
 JOB : Name of the job to build
 -c  : Check for SCM changes before starting the build, and if there's no
       change, exit without doing a build (default: false)
 -f  : Follow the build progress. Like -s only interrupts are not passed
       through to the build. (default: false)
 -p  : Specify the build parameters in the key=value format. (default: {})
 -s  : Wait until the completion/abortion of the command. Interrupts are passed
       through to the build. (default: false)
 -v  : Prints out the console output of the build. Use with -s (default: false)
 -w  : Wait until the start of the command (default: false)
[ Not Started ] Package: vyos-build-container - Branch: equuleus
[   Running   ] Package: vyos-build-container - Branch: sagitta
[   Running   ] Package: vyos-build-container - Branch: current

The equuleus build command for vyos-build-container failed.

The failing command:

java -jar jenkins-cli.jar -s http://172.17.17.17:8080 -auth AUTH build 'vyos-build-container/equuleus' -p =

Something wrong with the parameters.

I guess you are expecting this:

    "branchParameters": {
      "equuleus": [
          {"name": ""ELTS_MIRROR", "value": "http://172.17.17.17:3142/deb.freexian.com/extended-lts"}
      ]

But the definition is:

    "branchParameters": {
      "equuleus": {
        "ELTS_MIRROR": "http://172.17.17.17:3142/deb.freexian.com/extended-lts"
      }

General ideas

Would be good to add shebang #!/usr/bin/env bash to the helper-logic so syntax highlight works.

dd010101 commented 5 months ago

This fixes the parameter parsing:

      # Construct the command line with the parameters.
      if [[ $JSON != "null" ]]; then
        while read variable
        do
          echo $variable
          name=$(echo "$variable" | jq -r .key)
          value=$(echo "$variable" | jq -r .value)

          COMMAND="$COMMAND -p $name=$value"
        done < <(echo "$JSON" | jq -c 'to_entries | .[]')
      fi

Instead of the original:

      # Construct the command line with the parameters.
      if [[ $JSON != "null" ]]; then
        while read variable
        do
          name=$(echo "$variable" | jq -r .name)
          value=$(echo "$variable" | jq -r .value)

          COMMAND="$COMMAND -p $name=$value"
        done < <(echo "$JSON" | jq -c '.[]')
      fi

The done line:

done < <(echo "$JSON" | jq -c 'to_entries | .[]')

The name line:

name=$(echo "$variable" | jq -r .key)
dd010101 commented 5 months ago

Step 7

One or more packages failed to build.
Please check inside Jenkins to see what went wrong, and run a new build of the failed package.

It doesn't say what failed, is it possible to print list of failed packages? In this case it was way back in the scroll buffer and not easily visible in Jenkins either since one job failed early and many successful ones followed.

Sagitta kernel failed because of temporary git fetch glitch - would be nice to retry once due to failures like these - I saw this multiple times where simple retry fixes the issue. Easy to solve for me but perhaps not for everyone.

ISO

Both equuleus and sagitta successful.

The Building the ISO... is very long, without any progress, it's easy to think something is wrong, it's stuck... There is no way to show progress but we could just show the output? It's better to see lots of output rather than nothing at all for long time?

Conclusion

Good! With small fix it works!

Step 2 - Multiple IP addresses could be improved (for details see above). Step 5 - Parameters need to be fixed (fix posted above). Step 7 - It could list failed packages and maybe even retry the build once more - if it's not too complicated (for details see above). ISO - It could show the build output to at least see something, like this we see nothing for long time (for details see above). General - The helper-logic needs #!/usr/bin/env bash for editors to recognize syntax.

Good job! You made lot of work and now it's much improved!