Homebrew / homebrew-portable-ruby

🚗 Versions of Ruby that can be installed and run from anywhere on the filesystem.
BSD 2-Clause "Simplified" License
133 stars 43 forks source link

portable_formula: use glibc@2.13 on Linux #156

Closed danielnachun closed 2 years ago

danielnachun commented 2 years ago

This PR should allow us to build the portable formulae on Linux runners with newer glibc. We won't need to rebuilt the bottles since they are currently build against glibc 2.13 in the Wheezy container - the goal here is to make sure they work.

danielnachun commented 2 years ago

I'm a little unsure how to add the glibc@2.13 dependency here as on_linux is not defined in this class.

Bo98 commented 2 years ago

It'll need to be in self.inherited of PortableFormula, like the keg_only line.

danielnachun commented 2 years ago

Okay that got us past the first hurdle but now it's failing with this:

glibc@2.13: gawk 3.0 (or later) is required to build glibc.

I guess it's trying to build glibc@2.13 from source? Is there a way to use the bottle or do we have to rebuild it here for the test?

Bo98 commented 2 years ago

You'll need to special case install glibc@2.13 without the --build-bottle here: https://github.com/Homebrew/homebrew-portable-ruby/blob/a213023f278e0950612d95b34e3700a5b227a52a/cmd/portable-package.rb#L37-L40

danielnachun commented 2 years ago

I realized we needed to test this with Ubuntu 16.04, so I changed the Dockerfile to use that instead. Now I'm stuck with

fatal: not in a git directory

Presumably I'm missing a step in the Dockerfile, though I'm not exactly sure what to change.

Bo98 commented 2 years ago

I realized we needed to test this with Ubuntu 16.04, so I changed the Dockerfile to use that instead. Now I'm stuck with

fatal: not in a git directory

Presumably I'm missing a step in the Dockerfile, though I'm not exactly sure what to change.

I'll take a look tomorrow but maybe try pass --debug for more information.

danielnachun commented 2 years ago

I added --debug to the output but it's still not clear to me what the cause of the problem is.

Bo98 commented 2 years ago

That error is really weird. The .git directory exists but git says we're not in a git directory? I'll see if I can try reproduce locally.

danielnachun commented 2 years ago

I can't reproduce it locally by just using docker run -it homebrew/ubuntu16.04:master, so I'm assuming there's something about the Dockerfile that's causing this problem.

iMichka commented 2 years ago

Maybe it has to do with the way we mount the core repo. What does ${GITHUB_REPOSITORY,,} do? Is that a typo? https://github.com/Homebrew/homebrew-portable-ruby/blob/master/.github/workflows/build.yml#L45

danielnachun commented 2 years ago

Maybe it has to do with the way we mount the core repo. What does ${GITHUB_REPOSITORY,,} do? Is that a typo? https://github.com/Homebrew/homebrew-portable-ruby/blob/master/.github/workflows/build.yml#L45

That does look very odd, but I pushed a commit to change it and it didn't seem to help. I'm still not sure how to try to reproduce this locally. I think I have to run the Dockerfile locally with docker but I'm still figuring that out.

Bo98 commented 2 years ago

What does ${GITHUB_REPOSITORY,,} do? Is that a typo?

No that's intentional - it converts it to lowercase.

$ GITHUB_REPOSITORY=Homebrew/homebrew-portable-ruby; echo ${GITHUB_REPOSITORY,,}
homebrew/homebrew-portable-ruby

Really we can ditch the Dockerfile. I can push the CI changes needed to do that, though it would be nice to know how we're managing to get past the several git repo checks in brew as that seems like a bug. Permissions?

Bo98 commented 2 years ago

Our "Test Portable Ruby" step apparently has never worked properly on macOS since it's impossible to specify an external Ruby to be used. It also didn't work on Linux because HOMEBREW_RUBY_PATH is ignored, but I've fixed that to use PATH now.

Bo98 commented 2 years ago

Ok overhauled the Docker side of CI so everything should be working there now.

Current status here is OpenSSL fails to compile:

/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib/libpthread.so: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status

libpthread.so is a "GNU ld script" text file. Not sure why it's not liking it.

Bo98 commented 2 years ago

Looks like GNU ld scripts do not like the @ symbol.

Bo98 commented 2 years ago

This is tripping up a lot of configure checks and is breaking things in non-obvious ways:

In file included from /home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/stdio.h:28:0,
                 from conftest.c:1:
/home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/features.h:323:0: warning: "__STDC_IEC_559__" redefined
 #define __STDC_IEC_559__  1
 ^
In file included from <command-line>:0:0:
/usr/include/stdc-predef.h:38:0: note: this is the location of the previous definition
 #  define __STDC_IEC_559__  1
 ^
In file included from /home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/stdio.h:28:0,
                 from conftest.c:1:
/home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/features.h:324:0: warning: "__STDC_IEC_559_COMPLEX__" redefined
 #define __STDC_IEC_559_COMPLEX__ 1
 ^
In file included from <command-line>:0:0:
/usr/include/stdc-predef.h:46:0: note: this is the location of the previous definition
 #  define __STDC_IEC_559_COMPLEX__ 1
 ^
In file included from /home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/stdio.h:28:0,
                 from conftest.c:1:
/home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/features.h:327:0: warning: "__STDC_ISO_10646__" redefined
 #define __STDC_ISO_10646__  200009L
 ^
In file included from <command-line>:0:0:
/usr/include/stdc-predef.h:55:0: note: this is the location of the previous definition
 #define __STDC_ISO_10646__  201505L
 ^

Might need to check binutils too.

Bo98 commented 2 years ago

I think the way to do this properly would be to try to avoid using the system headers wherever possible (excluding compiler stuff).

Which would mean something like:

-nostdinc -isystem$(gcc-5 --print-file-name=include) -isystem$(gcc-5 --print-file-name=include-fixed) -isystem/home/linuxbrew/.linuxbrew/opt/linux-headers@4.4/include -isystem/home/linuxbrew/.linuxbrew/opt/glibc@2.13/include -L/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib -Wl,-rpath-link=/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib

as a minimum. Though this sounds like it should be in the superenv layer rather than here.

danielnachun commented 2 years ago

I tried in binutils to use -nostdinc and then add back the GCC headers before and the superenv filtered out the GCC header paths. So there may be work needed there to make sure we can add those paths as needed.

danielnachun commented 2 years ago

I did some local testing and I can confirm that setting the CFLAGS and CPPFLAGS as @Bo98 described above are sufficient for the portable-ruby build to work, though I had to tweak a few things. I also went back to binutils and found that using a subset of those flags fixed the build there without having to pass any CFLAGS to define macros as I had done before, confirming that the root cause of the original build failures in both cases is mixing of system headers with brewed headers. Here are my main conclusions so far: 1) We need to add a conditional to https://github.com/Homebrew/brew/blob/master/Library/Homebrew/shims/super/cc#L258-L282 so when we pass -nostdinc, all include paths are kept so that we don't filter out the paths we have to add for the GCC headers which are in /usr/lib/gcc/x86_64-linux-gnu/5/include and /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed. Currently these are stripped out by the superenv and will cause the build to fail, so I had to hack the filtering to not remove any flags for the build to work. 2) When glibc@2.13 is a dependency for a formula on Linux, the superenv should automatically add -nostdinc -isystem$(#{cc} --print-file-name=include) -isystem$(#{cc} --print-file-name=include-fixed) as include flags. We need to do this in the superenv rather than passing these arguments in as CFLAGS because otherwise the CFLAGS get copied into rbconfig.rb and this breaks building native gems with the system glibc, which we want to be able to do after installing portable-ruby. 3) We need to add linux-headers as a build dependency of any formula that also has glibc@2.13 as a build dependency. I've already implemented this for portable-formula.rb and will push a commit for this, and we'll also have to change this for binutils and zlib. 4) Specifically in portable-ruby, we have to add LIBS="-lpthread" on Linux because some of the conftest binaries secretly have a run time dependency on libpthread. If we don't explicitly declare the dependency then the linker finds system libpthread instead of the one from glibc@2.13 and the conftest binary is broken at run time because libpthread expects to find newer symbols from the other libc libraries, whose maximum symbol versions are set to 2.13. Confusingly this does not cause configure to fail but leads to a bizarre build failure much later due to an undefined macro (yes this was as much of a pain to figure out as it sounds).

I will work on a brew PR to address points 1 and 2 as they are fairly straight forward, and will push a commit to this PR to address points 3 and 4, though it will still fail until the issues in brew are fixed.

danielnachun commented 2 years ago

I have opened https://github.com/Homebrew/brew/pull/13568 which adds -nostdinc when glibc@2.13 is a dependency, address point 2 I made above. I crossed out point 1 because it is irrelevant if we are . I have also pushed an update to this that address points 3 and 4, though the build still fails because https://github.com/Homebrew/brew/pull/13568 has to be merged first.

Bo98 commented 2 years ago

4) Specifically in portable-ruby, we have to add LIBS="-lpthread" on Linux because some of the conftest binaries secretly have a run time dependency on libpthread. If we don't explicitly declare the dependency then the linker finds system libpthread instead of the one from glibc@2.13 and the conftest binary is broken at run time because libpthread expects to find newer symbols from the other libc libraries, whose maximum symbol versions are set to 2.13. Confusingly this does not cause configure to fail but leads to a bizarre build failure much later due to an undefined macro (yes this was as much of a pain to figure out as it sounds).

It's picking up pthread as a dependency to librt. The issue you are seeing is because -L does not apply to indirect dependencies and glibc@2.13 libraries do not specify RPATHs. So we are just defaulting back to the global system search path.

It's why I suggested -Wl,-rpath-link=/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib.

danielnachun commented 2 years ago

It's picking up pthread as a dependency to librt. The issue you are seeing is because -L does not apply to indirect dependencies and glibc@2.13 libraries do not specify RPATHs. So we are just defaulting back to the global system search path.

It's why I suggested -Wl,-rpath-link=/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib.

In that case should we add the RPATH with the superenv like what we're doing for the include dirs? Or is it preferable to add it just for portable-ruby? I'm slightly unsure of the potential ramifications of adding this universally whenever we build with glibc@2.13.

Bo98 commented 2 years ago

What -rpath-link does is changes where to look for libraries at link time. Given we want to avoid system glibc even being in consideration, I think the flag makes sense to add whenever we use glibc@2.13.

-rpath-link is not be confused -rpath. Nothing in the output binary changes with -rpath-link - it only affects where to find libraries at link time.

danielnachun commented 2 years ago

It's picking up pthread as a dependency to librt. The issue you are seeing is because -L does not apply to indirect dependencies and glibc@2.13 libraries do not specify RPATHs. So we are just defaulting back to the global system search path. It's why I suggested -Wl,-rpath-link=/home/linuxbrew/.linuxbrew/opt/glibc@2.13/lib.

In that case should we add the RPATH with the superenv like what we're doing for the include dirs? Or is it preferable to add it just for portable-ruby? I'm slightly unsure of the potential ramifications of adding this universally whenever we build with glibc@2.13.

Thanks for explaining that, I think it's safe to add this as a linker flag. I will update https://github.com/Homebrew/brew/pull/13568 to automatically add this to the ldflags.

danielnachun commented 2 years ago

Now that's a new error:

In file included from closure.c:1:
./fiddle.h:42:10: fatal error: ffi.h: No such file or directory
   42 | #include <ffi.h>
      |          ^~~~~~~

Does that mean we need to have a portable libffi? Or alternatively that we need to disable libffi if it's optional?

Bo98 commented 2 years ago

Ruby ships with libffi 3.2.1 which is used if there is none available on the system. Looks like something has broke which makes it never used (finding something on the system maybe?).

Will need full logs to see.

Bo98 commented 2 years ago

Oh looks like you're not using the container anymore. I'll push a fix for that (though I feel something's not right for it to behave the way it is anyway).

Bo98 commented 2 years ago

Seems pretty close. Looks like I'll need to tweak the workflow a little more to fix the current file link issue.

When looking back through the logs, I do see this in make install:

In file included from /home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/stdio.h:28:0,
                 from ./include/ruby/defines.h:123,
                 from ./include/ruby/ruby.h:29,
                 from ./version.c:12:
/home/linuxbrew/.linuxbrew/opt/glibc@2.13/include/features.h:323:0: warning: "__STDC_IEC_559__" redefined
 #define __STDC_IEC_559__  1
 ^
In file included from <command-line>:0:0:
/usr/include/stdc-predef.h:38:0: note: this is the location of the previous definition
 #  define __STDC_IEC_559__  1
 ^

Which is slightly concerning since -nostdinc is meant to prevent this. It's possible it's harmless in that context, but we should still look into why that's happening.

Edit: might be because we don't unconditionally add that flag - I've left a comment on the brew PR.

danielnachun commented 2 years ago

Sorry for messing up the container usage, there was a merge conflict and I didn’t fix it correctly.

I’ll locally test a fix for the nostdinc issue in brew and open a PR if it works.

danielnachun commented 2 years ago

The Linux build finally worked! I'm not sure what's up the ARM build now - it's failing to upload the build artifact.

Bo98 commented 2 years ago

Probably a random networking issue and will work on a retry.

danielnachun commented 2 years ago

How do we restart CI here? I don't see a button like I've used with homebrew-core and brew.

Also the 10.11 build failed in the "Test Portable Ruby" step but didn't really output anything about why.

MikeMcQuaid commented 2 years ago

How do we restart CI here? I don't see a button like I've used with homebrew-core and brew.

I've rerun them. Presumably you don't have sufficient permissions.

Bo98 commented 2 years ago

Also the 10.11 build failed in the "Test Portable Ruby" step but didn't really output anything about why.

In the last push I fixed this step so it would actually test something - i.e. it now actually checks if it is using the just-built Ruby.

For months/years it's never actually tested that properly. For months/years it's actually been using the existing Portable Ruby that brew installs.

I've got it working on Linux but it seems to still be using it's own Portable Ruby on macOS for some reason. I think because the removal of the existing Portable Ruby in brew cleanup doesn't take HOMEBREW_USE_RUBY_FROM_PATH into consideration so either we change that or we remove the existing Portable Ruby install more directly in the workflow rather than relying on brew cleanup, which I'll probably do.

Bo98 commented 2 years ago

Looks like HOMEBREW_USE_RUBY_FROM_PATH doesn't actually work on macOS. I can probably just make the step Linux only for now.

danielnachun commented 2 years ago

I made the "Test Portable Ruby" step Linux-only and 10.11 now works, but now ARM isn't. I messed up pulling the other commits added here and overwrote some by accident and that may be why ARM isn't working, though I thought I managed to restore everything.

Bo98 commented 2 years ago

Sorry for the delay - I took the week off.

I think the proper fix is still on the brew side. Skipping tests is not really acceptable in the long term, even if it has always been an issue. I'll take a look later today.

danielnachun commented 2 years ago

Should we rerun this with testing of Portable Ruby enabled on macOS when https://github.com/Homebrew/brew/pull/13644 is merged?

MikeMcQuaid commented 2 years ago

Should we rerun this with testing of Portable Ruby enabled on macOS when Homebrew/brew#13644 is merged?

Did this now @danielnachun 👍🏻

Bo98 commented 2 years ago

10.11 works now. 11-arm64 however refuses to use the Ruby on the path:

Homebrew Ruby: 2.6.3 => /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/bin/ruby
Bo98 commented 2 years ago

Looks like HOMEBREW_USE_RUBY_FROM_PATH doesn't actually prioritise from the user PATH as I thought - it prioritises /usr/bin over anything, and it just happens to work on 10.11 because /usr/bin/ruby there is so ancient it can't run the version check script.

github-actions[bot] commented 2 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

MikeMcQuaid commented 2 years ago

@danielnachun Is this needed for 3.6.0/the glibc changes?

Bo98 commented 2 years ago

This is only needed to deprecate the Debian 7 Docker image. No plans to ship a new version of portable-ruby - existing bottles should be fine.

This is ready to go, just need to fix HOMEBREW_USE_RUBY_FROM_PATH over in brew first.

MikeMcQuaid commented 2 years ago

This is only needed to deprecate the Debian 7 Docker image. No plans to ship a new version of portable-ruby - existing bottles should be fine.

Ok. Sounds like this may also be worth doing as part of 3.6.0.

danielnachun commented 2 years ago

It would be great to get this in with 3.6.0. I've tried to look into this and haven't figure anything out yet.

MikeMcQuaid commented 2 years ago

Hurrah, thanks for getting this over the line, both!