awesomebytes / gentoo_prefix_ci

A Gentoo Prefix continuous integration repo. Find precompiled Gentoo Prefix in the releases section
BSD 2-Clause "Simplified" License
14 stars 4 forks source link

Gentoo Prefix on fedora is never building #2

Open awesomebytes opened 5 years ago

awesomebytes commented 5 years ago

Hello @haubi given you had some interest on it, maybe you can take a look on what is happening on the fedora build?

You can check the last build there:

https://dev.azure.com/12719821/12719821/_build/results?buildId=462

It should be pretty obvious... The current error is:

2019-01-19T13:02:34.2192269Z   Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f28&arch=x86_64 [Connection timed out after 30002 milliseconds]
2019-01-19T13:02:34.6829797Z ‌The command '/bin/sh -c dnf install @development-tools -y' returned a non-zero code: 1‌

Which is just the second command of the Dockerfile.fedora:

https://github.com/awesomebytes/gentoo_prefix_ci/blob/master/initial_bootstrap/Dockerfile.fedora#L4

This did work in a previous build (even tho the bootstrap failed):

https://dev.azure.com/12719821/12719821/_build/results?buildId=456

haubi commented 5 years ago

AFAICS, this is gone now?

awesomebytes commented 5 years ago

True, apparently now we are stuck further forward:

9-01-23T15:19:07.5256031Z >>> Downloading 'https://ftpmirror.gnu.org/help2man/help2man-1.47.4.tar.xz'
2019-01-23T15:19:07.5322233Z https://ftpmirror.gnu.org/help2man/help2man-1.47.4.tar.xz: HTTPS support not compiled in.
2019-01-23T15:19:07.5333000Z !!! Couldn't download 'help2man-1.47.4.tar.xz'. Aborting.
2019-01-23T15:19:07.5349995Z  * Fetch failed for 'sys-apps/help2man-1.47.4', Log file:
2019-01-23T15:19:07.5350656Z  *  '/tmp/gentoo/var/tmp/portage/sys-apps/help2man-1.47.4/temp/build.log'
2019-01-23T15:19:07.5388894Z 
2019-01-23T15:19:07.5389739Z >>> Failed to emerge sys-apps/help2man-1.47.4, Log file:
haubi commented 5 years ago

So this is the reason we probably should have both a LATEST and a non-LATEST job: The reason here probably is that the tree snapshot is out of date, and this version was dropped from gentoo mirrors already.

haubi commented 5 years ago

Ah, nope - this package is outdated in prefix-overlay...

haubi commented 5 years ago

And now we're at https://bugs.gentoo.org/674784

awesomebytes commented 4 years ago

Hey @haubi I just took a quick revisit to this. I commented out from the bootstrap-prefix.sh the line where it tries to use wget first, so now it tries curl first (https://gitweb.gentoo.org/repo/proj/prefix.git/tree/scripts/bootstrap-prefix.sh#n46 ) initial_bootstrap/bootstrap-prefix-fedora.sh and with it, seems like the bootstrapping keeps going.

Has been running for an hour: https://dev.azure.com/12719821/12719821/_build/results?buildId=757&view=logs&j=0568c32a-8b53-54d2-38fa-276fb934921d

I don't know if this is a viable workaround for the bug 674784, but it's it going forward.

awesomebytes commented 4 years ago

@haubi It did advance quite a bit (went on for 2h30). But then it failed on emerging Python 3.6.8 on stage 3.

You can find the full log here: https://dev.azure.com/12719821/e566c963-8f77-4f01-b7bc-ae2d91b1334f/_apis/build/builds/757/logs/13

haubi commented 4 years ago

There is a known problem with python on 64bit, detecting libcrypt when /usr/lib/libcrypt.so does exist, while it actually should test if /usr/lib64/libcrypt.so does exist. This is especially difficult as we use 'lib' for the libdir in Prefix even on 64bit.

For the time being, the workaround is to additionally install "libstdc++-devel.i686" even for 64bit prefix.

Note that I do install a limited set of packages, not the whole development-tools: https://dev.azure.com/gentoo-prefix/_git/ci-builds?path=%2Fdocker%2FDockerfile.fedora28&version=GBmaster

awesomebytes commented 4 years ago

Oh, I never found your CI repo. I'll take a closer look at the tricks you've used, looks cool!

I also could drop the Prefix on Fedora bootstrap from this repo. As I'm not using it for anything and you are already running it. @haubi what do you think?

Also, where is the Git repository located? Is it straightaway Azure? (https://dev.azure.com/gentoo-prefix/_git/ci-builds)

haubi commented 4 years ago

Well, I've created it after being inspired by yours, just with some beautiful URL and the Cygwin jobs. IMO, it does make sense to run both variants, one with complete "development tools" like yours and one with reduced set of packages like mine. I won't mind to add another set of jobs using "development tools", but that would increase total build time as I run into the 10 parallel jobs limit already, so I'm happy to leave them in your repo unless you have need to get rid of them. However, given that the RAP builds may exceed the 6 hours limit whenever proceeding that far, I'm providing an Ubuntu 18 Azure VM using my monthly 45,- EUR MSDN credit now, for ~3 days a week. Same for the Cygwin jobs, where I have another Azure VM for.

awesomebytes commented 4 years ago

Currently on the move on mobile sorry for not answering to all the message. For the problem of running out of the 6h limit... We maybe should divide the task in 3 jobs: stage 1, stage 2 & stage 3. Each one can run for 6h.

I was already thinking of doing that as sometimes it takes longer than 6h for me too.

I don't think the bootstrap prefix is prepared for it. Maybe it would be beneficial to have it supported.

haubi commented 4 years ago

Well, besides 'noninteractive', bootstrap-prefix.sh does support 'stage1', 'stage2' and 'stage3' arguments as well - but I'm unsure about their current usability within automatic jobs. But maybe we can identify and avoid some needless but lengthy task during the RAP bootstrap, to keep it below the 6 hours limit?

awesomebytes commented 4 years ago

Oh, I didn't know or remembered it supported such arguments.

I'd go back to my previous strategy with that then. Create one job that does stage 1, the last step is to upload the docker image (Dockerhub currently, probably GitHub images now if I start messing with this, I have to take a look at Azure Artifacts, but I didn't use it when I initially developed this for some reason... Don't remember why, probably didn't give me the same flexibility than DockerHub). Then the next job (dependant on the success of the stage 1 job) pulls the image previously uploaded and starts from it to bootstrap stage 2. And repeat the process for stage 3.

Allows easy debugging on the different stages as there is a docker image to pull to do manual tests.

On Mon, Dec 2, 2019, 19:46 Michael Haubenwallner notifications@github.com wrote:

Well, besides 'noninteractive', bootstrap-prefix.sh does support 'stage1', 'stage2' and 'stage3' arguments as well - but I'm unsure about their current usability within automatic jobs. But maybe we can identify and avoid some needless but lengthy task during the RAP bootstrap, to keep it below the 6 hours limit?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/awesomebytes/gentoo_prefix_ci/issues/2?email_source=notifications&email_token=AANEK5HTB4LAFXFJHZ2CHWTQWTDMTA5CNFSM4GR2SRN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSWPYQ#issuecomment-560293858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEK5FN33CPB5AVLS435HLQWTDMTANCNFSM4GR2SRNQ .

haubi commented 4 years ago

IIRC, uploading artifacts per build job is something new Azure didn't support yet when you started off, and I still haven't figured out how to utilize one job's artifacts in a subsequent one...

awesomebytes commented 4 years ago

Maybe I just didn't figure it out either! Haha

The main downside I thought I would find by using DockerHub was speed of uploading/downloading images. But it seems this happens in very close data centers cause it's almost instantaneous.

awesomebytes commented 4 years ago

I've checked Azure Artifacts. I mostly see downsides:

Upsides are:

awesomebytes commented 4 years ago

Well, besides 'noninteractive', bootstrap-prefix.sh does support 'stage1', 'stage2' and 'stage3' arguments as well - but I'm unsure about their current usability within automatic jobs.

I played around with the stage based bootstrap. It doesn't prepare all the environment variables as 'noninteractive' does. I've added another argument called STOP_AFTER_STAGE (a 3rd argument, so bootstrap-prefix.sh PREFIX_PATH INTERACTIVE_MODE_OR_STAGE_FUNCTION STOP_AFTER_STAGE) which is checked after correctly bootstrapping stage 1 and 2 to see if we should stop there but still following the 'noninteractive' codepath. I'm currently testing it in a branch where I separated the bootstrap in stages for CI.

I wonder if it would be acceptable to merge such a change in the bootstrap-prefix.sh script. Maybe making it prettier, another set of keyword as noninteractive_stage1, noninteractive_stage2.

awesomebytes commented 4 years ago

My current bootstrapping (in the branch called separate_stages) does:

  1. Prepares a Docker image with basic tools before starting to bootstrap. (6min30s)
  2. Bootstraps Stage 1. (10min30s)
  3. Bootstraps Stage 2. (30min)
  4. Bootstraps Stage 3. (2h27min)
  5. Finishes bootstrap with emerge -e system. (2h40)

Step 4 and 5 are expected times as I haven't had a full CI run yet. I also wasn't aware of the length of the step 5.

haubi commented 4 years ago

Thanks for the nice collection on Azure Artifacts - I may consider learning github storage for Cygwin bootstrap results instead, to allow for them to become useful.

After emerging some more packages, before setting the stage3-finished marker the last steps in stage3 are: emerge --sync emerge -u system emerge --depclean It may make sense to have a docker image that has stopped right before, because everything before should be more stable. Maybe even have them run, but stop right before the finished marker to get them rerun.

And of course, patches are welcome!

awesomebytes commented 4 years ago

Thanks for the nice collection on Azure Artifacts - I may consider learning github storage for Cygwin bootstrap results instead, to allow for them to become useful.

I use GitHub releases to make them useful. Unlimited storage per release, but you need to divide your artifacts in pieces of less than 2GB. As you can see in my other project: https://github.com/awesomebytes/ros_overlay_on_gentoo_prefix/releases Specifically in these lines: https://github.com/awesomebytes/ros_overlay_on_gentoo_prefix/blob/master/azure-pipelines.yml#L272-L283

I automated splitting the result in 1GB files and uploading them as a release.

After emerging some more packages, before setting the stage3-finished marker the last steps in stage3 are: emerge --sync emerge -u system emerge --depclean It may make sense to have a docker image that has stopped right before, because everything before should be more stable. Maybe even have them run, but stop right before the finished marker to get them rerun.

Sounds good!

And of course, patches are welcome!

I'm not very versed in bash, so I don't know what would be the preferred way of doing this. My current approach works, but there are different philosophies on how it would be cleaner without changing much in the current script.

bootstrap-prefix.sh looks like this right now: https://github.com/awesomebytes/gentoo_prefix_ci/blob/adf330f26c7e1f98557e7843b7b2180e9fbfc3ac/bootstrap_stage/bootstrap-prefix.sh You can see the changes by just doing a search for STOP_AFTER_STAGE