Implementing installs and build scripts

GurliGebis commented 5 months ago

This Pull Request is meant as an easy way to review and correct issues with the install scripts.

GurliGebis commented 5 months ago

I don't understand why the docker interface is captured for you - it isn't for me.

NETWORK_INTERFACE=`ls /sys/class/net/ -1 |grep eth |grep -v veth`

Since it grep's for "eth", the docker interface shouldn't be included in the list. What interfaces do you have in /sys/class/net ?

GurliGebis commented 5 months ago

Step 5 - done - it seems like a rebase failed (I had done the json changes locally on my test vm, but it wasn't commited to git for some reason)

GurliGebis commented 5 months ago

Step 7 - I'll see what I can do - should be easy enough to have a list of failed packages, and then dump it at the end - I'll look into it 🙂 ISO - Printing the entire build process is something I opted to not do, since it just spews way to much garbage on the screen - but I got another idea (I'll update the text to say that it will take a while - and tell people that they can run the docker attach ID command in another console window, to follow along)

I might get time to fix the above things tomorrow - We're getting close 😀

dd010101 commented 5 months ago

I don't understand why the docker interface is captured for you - it isn't for me.
NETWORK_INTERFACE=`ls /sys/class/net/ -1 |grep eth |grep -v veth`
Since it grep's for "eth", the docker interface shouldn't be included in the list. What interfaces do you have in /sys/class/net ?

I think that's missing the point. I would suggest not to try to filter the interfaces since people can have all sorts of interfaces including no ethX at all, you should list all IP addresses if you want to provide functional links and the user needs to pick what he uses, that's why we use dummy0 since you simply don't know what networking setup user uses, even on same OS with same version this differs a lot.

NETWORK_INTERFACE is empty, since you filter everything out, I don't have ethX, I have ens0, you can have end0, or usb0, or wlan0 and so on... This is with stock Debian Bookwom without any changes. Don't try to solve this, it can't be solved reliably. That's why the next command returns all IPs including the docker since ifconfig is running effectively without any interface because the argument is empty string.

I would suggest to add support for multiple IPs and exclude only docker/loopback and allow any other interface.

Step 5 - done - it seems like a rebase failed (I had done the json changes locally on my test vm, but it wasn't commited to git for some reason)

I do think the original JSON format with key-value is more pretty though!

dd010101 commented 5 months ago

ISO - Printing the entire build process is something I opted to not do, since it just spews way to much garbage on the screen - but I got another idea (I'll update the text to say that it will take a while - and tell people that they can run the docker attach ID command in another console window, to follow along)

What about include option - show progress [y/n] instead? The attach requires another tty/shell 😞.

This did make me think about how do you handle if the docker build fails and you only say it failed but not why and that's for sure a problem. I see the build ISO script is overall the old style without proper error handling where you use stderr mute and so on. This should be addressed in the same way as the other scripts.

I might get time to fix the above things tomorrow - We're getting close 😀

It's close indeed! 🚀

dd010101 commented 5 months ago

What about include option - show progress [y/n] instead?

What about if you would run the docker build command in background? Then you can launch the build and give the user option to show the output retroactively.

#!/bin/bash
set -e

# stop the background command on ctrl+c
# and cleanup temporary file and tail on exit
stty -echoctl
trap stop INT TERM
trap cleanup EXIT

function stop {
    kill $pid || true

    wait $pid
    exitCode=$?

    cleanup
    exit $exitCode
}

function cleanup {
    stty echo
    if [ "$buffer" != "" ]; then
        rm -f $buffer 2> /dev/null || true
    fi
    if [ "$tailPid" != "" ]; then
        kill $tailPid || true
    fi
}

buffer=$(mktemp -p /tmp --suffix=-background-buffer)

# dummy stdout generator
function dockerBuild {
    counter=0
    while (( $counter < 10 ))
    do
        echo $(date +%s)
        sleep 1
        counter=$(expr $counter + 1)
    done
}

dockerBuild > $buffer &
pid=$!

echo "Show output? Press y..."
while ps -p $pid > /dev/null
do
    if [ "$tailPid" == "" ]; then
        read -s -n 1 -t 1 input || true
        if [ "$input" == "y" ]; then
            tail -f -n +1 $buffer &
            tailPid=$!
        fi
    else
        sleep 1
    fi
done

wait $pid
exit $?

This example shows the principle - if you run it, it shows just the message Show output? Press y..., press y, it shows stdout (timestamps) from the background process and correctly manages the background process, exit codes and signals (ctrl+c/sigterm). I think that's more user-friendly than forcing the user to spawn another console with some command. What do you think?

GurliGebis commented 5 months ago

That makes sense - I'll see if I can integrate it :)

I have changed the json back again, and replaced the function with your fixed version - having the parameter name as the key is cleaner.

I changed the interface logic, so it filters out the interfaces we know we shouldn't be looking at (lo, dummy, docker, veth*), and then prints the URL for each of the remaining interfaces that has an IP address.

Next step is the save the list of failed jobs, so I can print them once it is done building. Then I'll look into integrating the code above 🙂

dd010101 commented 5 months ago

Great! 👍

That makes sense - I'll see if I can integrate it :)

That should be easy:

#!/bin/bash

function runWithLazyStdout {
    set -e
    command=$1

    # stop the background command on ctrl+c
    # and cleanup temporary file and tail on exit
    stty -echoctl
    trap stop INT TERM
    trap cleanup EXIT

    function stop {
        kill $pid || true

        wait $pid
        exitCode=$?

        cleanup
        exit $exitCode
    }

    function cleanup {
        stty echo
        if [ "$buffer" != "" ]; then
            rm -f $buffer 2> /dev/null || true
        fi
        if [ "$tailPid" != "" ]; then
            kill $tailPid || true
        fi
    }

    buffer=$(mktemp -p /tmp --suffix=-background-buffer)

    $command > $buffer &
    pid=$!

    echo "Show output? Press y..."
    while ps -p $pid > /dev/null
    do
        if [ "$tailPid" == "" ]; then
            read -s -n 1 -t 1 input || true
            if [ "$input" == "y" ]; then
                tail -f -n +1 $buffer &
                tailPid=$!
            fi
        else
            sleep 1
        fi
    done

    wait $pid
    exit $?
}

# dummy stdout generator
function dockerBuild {
    counter=0
    while (( $counter < 10 ))
    do
        echo $(date +%s)
        sleep 1
        counter=$(expr $counter + 1)
    done
}

# the parentheses are important to run the function in it's subshell
# this way the set -e and traps only apply for the runWithLazyStdout
# thus it will not interfere with our parent script
(
    runWithLazyStdout dockerBuild
)
# business as usual
echo "exit code $?"

You can wrap the whole thing as function in subshell, so no changes are needed. Without subshell as pure function it would be tricky since then the traps would interfere with following code in the main script.

GurliGebis commented 5 months ago

......
[  Completed  ] Package: wide-dhcpv6 - Branch: equuleus
[  Completed  ] Package: wide-dhcpv6 - Branch: sagitta

List of failed jobs:
[   Failed!   ] Package: dropbear - Branch: equuleus

One or more packages failed to build.
Please check inside Jenkins to see what went wrong, and run a new build of the failed package.
Once this is done, please run part eight to set up NGINX.

That was easy 🙂

GurliGebis commented 5 months ago

I just tested with the runWithLazyStdout function - it doesn't seem to work. Here is the output: https://gist.github.com/GurliGebis/b7346c96b91c06b01f0bd0a2d266de4f Here is the entire build-iso.sh script: https://gist.github.com/GurliGebis/491c674273ee5f879e9ccbbe5f1e9762

Any ideas?

GurliGebis commented 5 months ago

Okay, I forgot to remove -d from the build iso part.

But now it just prints the output from the docker build process, without asking first.

dd010101 commented 5 months ago

You are not calling the function at all as far as I see....

I see this:

(
  DockerBuild $EMAIL $RELEASE_NAME
)

I would expect this:

(
  RunWithLazyStdout "DockerBuild $EMAIL $RELEASE_NAME"
)

GurliGebis commented 5 months ago

Indeed - now it looks like it is working - I'll do some more testing and then ping you.

I also added this function, to make sure the version is using the latest tag for the correct branch: https://gist.github.com/GurliGebis/491c674273ee5f879e9ccbbe5f1e9762#file-build-iso-sh-L51

GurliGebis commented 5 months ago

@dd010101 It looks like it is working 🙂

One small "problem":

####################################
# Unofficial VyOS ISO builder v1.0 #
####################################

Please enter which branch you want to build (equuleus or sagitta): sagitta
Please enter your email address: REDACTED

Removing old vyos-build directory...
Cloning the VyOS build repository...
Checking out the sagitta branch...
Downloading apt signing key...
Building the ISO...
Show output? Press y...
useradd warning: vyos_bld's uid 0 outside of the UID_MIN 1000 and UID_MAX 60000 range.

It looks like warnings are printed directly to the screen, even if I haven't pressed y Can we fix that somehow?

dd010101 commented 5 months ago

That's a feature. You should never blindly drop stderr, if you want to suppress stderr then you should filter only the specific lines you have in mind so if something unexpected happens - you will see it as warning and you won't be left in dark.

function filterStderr {
    ( set -e; eval "$1" 2>&1 1>&3 | (grep -v -E "$2" || true); exit ${PIPESTATUS[0]}; ) 1>&2 3>&1
    return $?
}

Little file descriptor redirection. The stderr goes into stdout, so we can grep, the stdout goes into third descriptor as temporary place to store stdout. The grep filters the stderr (masked as stdout) and then we restore leftover of stdout (masked stderr) to stderr and restore stdout from third descriptor.

filterStderr "docker build ..." "(useradd warning|some other line)"

First argument is the command. Second argument is regex that will filter whatever you don't want to see.

This function is not perfect since it drops the filtered message outright, I would like to redirect the filtered message into stdout but you can use this for now. I don't know how to split the stream by grep to keep both what is matched and unmatched just in different file descriptors, yet...

GurliGebis commented 5 months ago

The docker run should be suppressed completely and just dumped into the temp file, shouldn't it?

The problem is that output is being printed without y being pressed.

I think messages like the one above is irrelevant, since they cannot be changed by the user, and are just warnings

dd010101 commented 5 months ago

No, docker build shouldn't spam you with stdout if you don't want - that's the optional output all about. The docker build can fail at any point thus you want to see when it's failing even if you don't show the output. The output is there for you, if you want to see progress not if it's failing - everybody in all conditions should see when it's failing.

Yes, the case where error messages happen by (bad) design and you know that's expected are irrelevant, that's why you should filter them out - so you see only the relevant messages. Thus if everything is fine - you don't see anything otherwise you shall see warning to help you solve the issue on hand and give you idea that perhaps you should look at the full output.

If you use both runWithLazyStdout and filterStderr then run runWithLazyStdout inside filterStderr so:

filterStderr "( runWithLazyStdout \"docker build ...\" )" "(useradd warning|some other line)"

It doesn't work other way around! It's getting quite complex with all these nested file descriptor redirections/subshells and things 😄.

GurliGebis commented 5 months ago

Perfect, I'll give it a test right away 🙂

GurliGebis commented 5 months ago

It works perfectly 😀

I have pushed all the changes from today to the cleanup branch - can you try again?

dd010101 commented 5 months ago

It works! After one and half hour I get both ISOs based on based basic Debian install.

Maybe the ISO should check if all previous scripts did run?

GurliGebis commented 5 months ago

I think checking if the repo is available on the http port, and if all packages in Jenkins have built successfully. If that is the case, we should be okay to build an ISO, right?

GurliGebis commented 5 months ago

Regarding the readme file, should I love it into the manual folder, or do you want to just update it with the info for this? (I think keeping both in the same file will make it cluttered)

dd010101 commented 5 months ago

I think checking if the repo is available on the http port, and if all packages in Jenkins have built successfully. If that is the case, we should be okay to build an ISO, right?

That's perhaps not needed, you can just check as you are checking with the other scripts if previous script did run.

Regarding the readme file, should I love it into the manual folder, or do you want to just update it with the info for this? (I think keeping both in the same file will make it cluttered)

I want to keep it together for now - like in the testing stage, later perhaps we can separate it. I'm not sure how Google is good with indexing nested files, it looks like it indexes mainly the front page so that's a good reason to keep it as one long noodle - we can make clean separation where one starts and other ends but there are shared parts too and the scripted method is short - "JUST RUN IT!" so I don't think it will make it worse - compared to the current state.

GurliGebis commented 5 months ago

I only check if the first scripts have run. Once we get to the package provision and build scripts, we don't set markers any more - this is due to if they fail, people have to go into Jenkins and fix them manually - in that case, no marker will be set, so if we check for them, it would prevent them from continuing after fixing the issues.

Regarding the readme's - I have seen other github repos with multiple .md files, where the readme section of the repo has multiple tabs - no idea how to enable it though - but that might be an idea, if we have multiple documentations.

I'll look into cleaning up this branch, and then I'll promote this PR to a real PR. Once it is merged, will you handle updating the readme?

dd010101 commented 5 months ago

I only check if the first scripts have run.

The aim for this is to ensure every script did run, doesn't matter if it failed or not - this will avoid people skipping steps by mistake - if step failed we can't do much about that, we can just print the stderr. The verification of result of each steps isn't required for this and it's too complicated to do anyway.

For example I did by mistake skipped last step - 8 nginx and build-iso failed of course because the URL of repository served default nginx page instead of repository - this isn't easy to recognize if you don't know what you looking at. Thus even as basic check as "did the script run at all?" is useful for human errors.

We don't need to check if the scripts did all the work - it's expected the scripts will do it and if not they will communicate by themselves that there is error to be addressed - so checking the result isn't essential - the user should already know something is wrong and if next step fails - it's easy to see why.

I have seen other github repos with multiple .md files, where the readme section of the repo has multiple tabs

I don't think that's possible for arbitrary content - there are predefined categories that are automatically applied. Like it shows README, LICENSE, CODE_OF_CONDUCT, CONTRIBUTING, SECURITY, ... You can't define this - the GitHub automatically creates tabs for predefined files if you have right files in right place thus if you want custom tab - like second readme - there is no mechanism to define this.

Once it is merged, will you handle updating the readme?

Yes I can. I already have draft with streamlined shared information and the separation of automated vs manual. I also plan to rearrange the files in repository after the merge is complete.

GurliGebis commented 5 months ago

I have updated the scripts to all set markers, and to all require the previous marker.

I have rebased it into the install-scripts branch - can you take a final look at it, before I convert this PR into one that can be merged? (If it is okay, I think we should merge it, and then you can update the readme file 😀)

dd010101 commented 5 months ago

I think you can do it already since it's like 99.999% complete. I'm running the test that it will take time but if something needs to be adjusted it will be minor so it can be done later. I did test it today earlier and the only thing that didn't work was sagitta ISO since the VyOS did break it because https://github.com/vyos/vyatta-op/pull/94 isn't merged yet.

I would prefer exclude the Move manual files into separate folder. I will arrange the structure by myself anyway and it's easier to correct links as I move the files.

GurliGebis commented 5 months ago

Okay, I'll remove that commit, and push the branch again, and mark the PR as ready for review 🙂

GurliGebis commented 5 months ago

@dd010101 I have removed the commit, and marked it as ready for review

dd010101 commented 5 months ago

Good job! 👍

GurliGebis commented 5 months ago

@dd010101 let me know if you want me to have a read-through of the readme changes, once they are ready.

dd010101 commented 5 months ago

The readme is already updated. I plan more changes to the directory structure and some misc changes.

GurliGebis commented 5 months ago

Great 👍 I was thinking - since they have branched out to have circinus already - would it make sense to update the setup to be able to build that as well?

dd010101 commented 5 months ago

I don't see the point until you see RC or sign of release - since the branch in development can be built by official method thus there is no point of using third-party solution. The circinus is far from release so...

GurliGebis commented 5 months ago

You got a point 🙂

dd010101 commented 5 months ago

Also there are changes coming - VyOS has plans to sunset the Jenkins build. I doubt they will go and rewrite scripts for all LTS branches but the circinus is expected to have these changes. They also switching from the current branch setup to stream branches. Thus I expect the circinus will have different way of building packages. We shall see!

dd010101 commented 5 months ago

I moved the files around - now it should be clear what is for what and overall more tidy, what do you think? Also the seed-jobs.sh uses the jsons from ./jobs/ so we don't have two definitions of jobs.

GurliGebis commented 5 months ago

Looks good 🙂 Do you have any idea on how they plan on building them going forward?

dd010101 commented 5 months ago

I did re-test and everything is working, well apart the sagitta ISO, since it's still broken because of the waiting pull-request.

Great job you did!

I have just one last thought - the markers will prevent the user to run next script if previous script fails - maybe it's better to mark at beginning so it marks that the script did run regardless of the ending? This way the user would need to rerun the failed script just to get the "check mark" even if he did fix the reported issue.

You did write something about this but then you changed mind?

Do you have any idea on how they plan on building them going forward?

There is no information about it yet. I have the feeling that VyOS is migrating towards IaaS rather DIY server solution. They moved from their own server to S3 for hosting APT repository for example. So perhaps they will use something like GitHub Actions for CI/CD? Who knows... Jenkins is very much old-fashioned CI/CD and CI/CD services like GitHub Actions are more fancy and all the rage now so it would make sense...

In any case - we gathered so much knowledge with this project that whatever they switch to it will be easy for us to use or workaround depending on what it will be! 😄

I don't think that's something we will see soon though and perhaps equuleus will never see the new system, I would think sagitta will since it's fresh and they will support it into the far future, like 2028 or something. The circinus is sure thing.

GurliGebis commented 4 months ago

The problem with that is, that people run script X, script X fails, people don't notice, and run script X+1 and nothing works. Maybe we should change it to having two markers, one for "X started" and one for "X succeeded".

Then, if you run script X+1, but only the "X started" exists, we warn the user, and ask them to press Y to verify that everything from X is set up correctly.

Would that be an idea?

Regarding going forward, I have seen a lot of commits related to Github Actions, so I wouldn't be surprised if the moved to that? (And why wouldn't they - it removes infrastructure they have to maintain, and since it is public repos, they have unlimited minutes)

dd010101 commented 4 months ago

The problem with that is, that people run script X, script X fails, people don't notice, and run script X+1 and nothing works.

That's why I don't like the clearing of screen since then you have no ability to look back and I did run into this many times where I did want to look back and it wasn't possible.

Then, if you run script X+1, but only the "X started" exists, we warn the user, and ask them to press Y to verify that everything from X is set up correctly.

That would much better indeed! I think even warning message alone that says the previous script failed would be enough.

Although what would be even better is to also have the ability to see output from previous steps at any point. What do you think about getting rid of the clear - I think that would not be as pretty but major functional improvement.

Regarding going forward, I have seen a lot of commits related to Github Actions...

There are some signs like https://github.com/vyos/vyos-1x/blob/current/.github/workflows/package-smoketest.yml And also https://github.com/vyos/vyos-1x/blob/sagitta/.github/workflows/build-package.yml The description seems like this isn't about building packages but about automatic testing after merge. All the work in the workflows seem to be related to automating the repos/pull-requests.

GurliGebis commented 4 months ago

Should we save the output of the scripts to some file? (That way, if people have issues, they can attach the file) Yep, but I think it points to them wanting to use Github Actions more - time will tell.

dd010101 commented 4 months ago

Then you loose the red color that's why just removing the clear seems like better option to me.

Having log file is a option if you want to keep the clear.

GurliGebis commented 4 months ago

A log file is better I think - that way if step 6 break for instance, then we can ask people for the output of step 1, even though they might have closed the window where they ran step 1.

dd010101 commented 4 months ago

Well you could have both but I think it's expected that you loose output if you close your terminal. So far I did run all scripts in sequence in one terminal but I had need to look back for various reasons not just because of errors. I think the log is extra hassle for the user to navigate, it's not easy to see what was previously echoed at glance without opening text editor.

The log file would be very tricky to do - in order to capture all the output and not break stderr propagation. You would need to wrap all scripts into another script that would somehow used tee to copy the output into two location but that's not easy since this will break stderr propagation. Proper implementation would be tricky... You think well just ... 2>&1 | tee some but this breaks the stderr so to do this without breaking stderr (downgrade to stdout) isn't easy and even the tee requires to wrap the whole script, like you would need to have ./1-prereqs.sh | tee ... or you can wrap whole content of 1-prereqs.sh and then execute it inside subshell and then you can use tee without additional script but this breaks the stderr... Maybe this would work? No, it doesn't work correctly either, this approach is also no go for multiple reasons.

It would be so much easier just to allow continuation of buffer by removing clearing.

GurliGebis commented 4 months ago

I agree - lets just comment out the clear screen part.

dd010101 commented 4 months ago

Done https://github.com/dd010101/vyos-jenkins/commit/d478e0ddefdd5175a3c10c4b9fa3c0ff3f13392f 👍

I researched the capturing of output more and indeed it's complicated and I found only solutions with tradeoffs - there is no universally compatible way to do this. Oh the shell - some things are way more complicated than you would expect... Also what got me last time was the task of "do something after the script exists" - not easy or straightforward either...

GurliGebis commented 4 months ago

I think we should try and keep it simple (like that ship didn't sail LONG ago 😀)

dd010101 / vyos-jenkins

Implementing installs and build scripts #27