danielparks / puppet-rustup

Puppet module to manage Rust with rustup
https://forge.puppet.com/modules/dp/rustup/readme
Other
0 stars 0 forks source link

Rustup can be invoked by this module as the wrong architecture on MacOS #80

Closed zbentley closed 8 months ago

zbentley commented 8 months ago

Context/Setup

To reproduce

> bolt apply --target localhost --run-as root --execute 'rustup { "$USER": toolchains => ["nightly-aarch64-apple-darwin"], targets => ["aarch64-apple-darwin"] }'

Expected behavior:

cargo and rustc report file types matching the host architecture. For an ARM mac that would look like

> file $(which cargo)
/Users/$USER/.cargo/bin/cargo: Mach-O 64-bit executable arm64

> file $(which rustc)
/Users/$USER/.cargo/bin/rustc: Mach-O 64-bit executable arm64

Observed behavior

> file $(which cargo)
file $(which cargo)
/Users/$USER/.cargo/bin/cargo: Mach-O 64-bit executable x86_64

> file $(which rustc)
/Users/$USER/.cargo/bin/rustc: Mach-O 64-bit executable x86_64

Why this matters

So what happens if cargo is a non-native architecture?

That last one can cause problems. I encountered one such issue where the cargo flamegraph plugin supplied by flamegraph-rs runs its inner dtrace process as x86, causing dtrace to fail when it traced arm64 processes. The bug report for that issue is here, and I blogged about the discovery process of this issue here.

Given that rustup/cargo installations can persist untouched for long periods of time, and given that the main compiler suite still works, I think it's likely that incorrect-architecture installations of cargo/rustc by this module persist without users' knowledge in many cases, silently causing slowdowns and occasionally causing louder issues like the above.

Suggested resolution

When this module bootstraps cargo with the rustup resource, or when it installs a per-toolchain cargo via the rustup::toolchain resource, it should allow user specification of the architecture to those resources.

The new architecture => or toolchain_architecture => parameter that users can specify should, if unspecified, default to the invoking machine's native architecture via the "architecture" fact.

When this module invokes subprograms internally, they should be wrapped in the arch command to force the architecture as requested by the user.

This behavior need only be present on MacOS, as other platforms do not do multiarch/binfmt translation in the same way (and the issue is much more common on MacOS than on free operating systems due to the frequently limited availability of source builds, as is the case with puppet-bolt in this case: only an x86_64 binary installable is provided, and build steps for creating a correct executable are not readily available).

danielparks commented 8 months ago

It’s been a while since I worked on this, so I could have the details wrong. Also, I don’t have an ARM Mac to test this on, which makes it harder to track down the issue myself.

I think the issue is the rustup wrappers, e.g. ~/.cargo/bin/cargo, are being installed using the “host” architecture, i.e. x84 because it’s running in Rosetta. The toolchains should still be installed for the architecture you specified, though the symptoms you report suggest that might not be happening — I would assume that if something running under Rosetta ran an ARM binary, then the ARM binary ran a universal binary, that it would choose the ARM version.

What does file ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/* give you?

danielparks commented 8 months ago

Probably the best way to solve the rustup architecture problem is to allow choosing a specific rustup-init (see Other ways to install rustup). The downside is that it doesn’t make sense to use that on the main path — reimplementing the installation shell script that figures out the architecture string would be too fragile.

zbentley commented 8 months ago

Thanks for the replies!

if something running under Rosetta ran an ARM binary, then the ARM binary ran a universal binary, that it would choose the ARM version.

I'd expect that too! But it does not seem to be occurring; it seems like if anything up the chain has been invoked as x86, even without an explicit arch override to do so, then future universal binaries will be launched as x86 even if there are intermediate ARM binaries launched (because they're non-universal and only support ARM). I really don't like that behavior from Rosetta/Apple, it seems quite counterintuitive, but ... what can you do 🤷

What does file ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/* give you?

When rustup-init is called by this module via x86 bolt, all binaries in that folder are x86.

When rustup-init is run by me (on a clean/non-initted system) from an ARM Bash/Zsh shell binary, all binaries in that folder are ARM.

Probably the best way to solve the rustup architecture problem is to allow choosing a specific rustup-init

I don't think that'll change anything, sadly. I think that, so long as Bolt's x86-only (which it is for now), regardless of how rustup-init is run, unless whatever runs it wraps it in arch -arch $native_arch rustup-init ..., this issue will reoccur.

I don’t have an ARM Mac to test this on, which makes it harder to track down the issue myself.

I'll take a stab at a PR (my Ruby is rusty though) in the coming weeks and you can decide if the arch -arch approach seems feasible or not.

danielparks commented 8 months ago

What does file ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/* give you?

When rustup-init is called by this module via x86 bolt, all binaries in that folder are x86.

That sounds like a bug in this module or maybe in rustup itself (unlikely, I think) — the toolchain should always be in the architecture in its name. For example, on my Intel Mac:

❯ rustup toolchain install --no-self-update --force-non-host nightly-aarch64-apple-darwin
  . . .
❯ cd ~/.rustup/toolchains
❯ file nightly-*-apple-darwin/bin/cargo       
nightly-aarch64-apple-darwin/bin/cargo: Mach-O 64-bit executable arm64
nightly-x86_64-apple-darwin/bin/cargo:  Mach-O 64-bit executable x86_64

Similarly, if I install the stable-x86_64-pc-windows-gnu toolchain, then the corresponding toolchain binaries are Windows executables.

Probably the best way to solve the rustup architecture problem is to allow choosing a specific rustup-init

I don't think that'll change anything, sadly. I think that, so long as Bolt's x86-only (which it is for now), regardless of how rustup-init is run, unless whatever runs it wraps it in arch -arch $native_arch rustup-init ..., this issue will reoccur.

I figure that the architecture of rustup-init is what determines the architecture of ~/.cargo/bin/cargo. That binary should just determine which toolchain is desired, then exec the correct cargo binary in ~/.rustup/toolchains/. Since ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo is x86_64 for you, I don't think that’s the (immediate) problem.

It still might be a problem later once the toolchains binaries get fixed.


All that said, it does seem to be working on my machine (not really a surprise, I suppose). Can you run it with the environment variable RUSTUP_TRACE=1 and the -v flag? That will show exactly what commands it runs.

For example, on my local machine:

❯ RUSTUP_TRACE=1 puppet apply -ve 'rustup { daniel: toolchains => ["nightly-aarch64-apple-darwin"], targets => [] }' 
Info: Loading facts
Info: Loading facts
Notice: Compiled catalog for marlow.local in environment production in 0.19 seconds
Info: Using environment 'production'
Info: Applying configuration version '1707364703'
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup show
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup toolchain list
Notice: /Stage[main]/Main/Rustup[daniel]/Rustup_internal[daniel]/toolchains: toolchains changed [
  {
    'ensure' => 'present',
    'name' => 'stable-x86_64-apple-darwin'
  },
  {
    'ensure' => 'present',
    'name' => 'nightly-x86_64-apple-darwin'
  },
  {
    'ensure' => 'present',
    'name' => '1.64-x86_64-apple-darwin'
  },
  . . .
  {
    'ensure' => 'present',
    'name' => '1.69.0-x86_64-apple-darwin'
  }] to [
  {
    'ensure' => 'present',
    'name' => 'nightly-aarch64-apple-darwin',
    'profile' => 'default'
  }]
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup target list --installed --toolchain stable-x86_64-apple-darwin
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup target list --installed --toolchain nightly-x86_64-apple-darwin
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup target list --installed --toolchain 1.64-x86_64-apple-darwin
. . .
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup target list --installed --toolchain 1.69.0-x86_64-apple-darwin
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup toolchain install --no-self-update --force-non-host --profile default nightly-aarch64-apple-darwin
Info: rustup_internal: as daniel: /Users/daniel/.cargo/bin/rustup toolchain list
Notice: Applied catalog in 13.45 seconds
❯ file ~/.rustup/toolchains/nightly-*-apple-darwin/bin/cargo
/Users/daniel/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo: Mach-O 64-bit executable arm64
/Users/daniel/.rustup/toolchains/nightly-x86_64-apple-darwin/bin/cargo:  Mach-O 64-bit executable x86_64
zbentley commented 8 months ago

My output looks exactly like yours when I run puppet apply, but the issue is the cargo binary being used. The ones in my toolchain directories do match the toolchain's arch. However, the cargo in ~/.cargo/bin/cargo matches the arch of the program that installed it, and that's the cargo on my PATH (is that bad? Should I change that?).

If I install it via puppet apply, it's ARM. If I do it via bolt, it's x86, because bolt ix x86. Unfortunately, bolt seems to be swallowing debug output related to RUSTUP_TRACE; here's my command and output:

> RUSTUP_TRACE=1 bolt apply --verbose --log-level debug --target localhost --run-as root --execute 'rustup { "zac": toolchains => ["nightly-aarch64-apple-darwin"], targets => ["aarch64-apple-darwin"] }'
<snip, lots of facts>
Finished: task apply_helpers::custom_facts with 0 failures in 12.62 sec
Finished: install puppet and gather facts with 0 failures in 12.89 sec
Finished: install puppet and gather facts with 0 failures in 12.89 sec
Starting: apply catalog on localhost
Starting: apply catalog on localhost
Applying manifest block on ["localhost"]
Running task 'apply_helpers::apply_catalog' on localhost
Started on localhost...
[{"target":"localhost","action":"apply","object":null,"status":"success","value":{"report":{"host":"atropos.local","time":"2024-02-08T10:57:32.795239000-05:00","configuration_version":1707407840,"transaction_uuid":null,"report_format":12,"puppet_version":"7.26.0","status":"changed","transaction_completed":true,"noop":false,"noop_pending":false,"environment":null,"logs":[{"level":"warning","message":"Private key for 'atropos.local' does not exist","source":"Puppet","tags":["warning"],"time":"2024-02-08T10:57:32.846257000-05:00","file":null,"line":null},{"level":"warning","message":"Client certificate for 'atropos.local' does not exist","source":"Puppet","tags":["warning"],"time":"2024-02-08T10:57:32.846321000-05:00","file":null,"line":null},{"level":"notice","message":"created","source":"/Stage[main]/Main/Rustup[zac]/Rustup_internal[zac]/ensure","tags":["notice","rustup_internal","zac","rustup","class"],"time":"2024-02-08T10:57:34.107290000-05:00","file":"/Users/zac/Desktop/Projects/Personal/zbox/.modules/rustFinished on localhost:
up/manifests/init.pp","line":81},{"level":"notice","message":"Applied catalog in 13.23 seconds","source":"Puppet","tags":["notice"],"time":"2024-02-08T10:57:46.068321000-05:00","file":null,"line":null}],"metrics":{"resources":{"name":"resources","label":"Resources","values":[["total","Total",1],["skipped","Skipped",0],["failed","Failed",0],["failed_to_restart","Failed to restart",0],["restarted","Restarted",0],["changed","Changed",1],["out_of_sync","Out of sync",1],["scheduled","Scheduled",0],["corrective_change","Corrective change",0]]},"time":{"name":"time","label":"Time","values":[["rustup_internal","Rustup internal",13.223539],["config_retrieval","Config retrieval",0],["transaction_evaluation","Transaction evaluation",13.226376999984495],["catalog_application","Catalog application",13.231105000013486],["total","Total",13.273145]]},"changes":{"name":"changes","label":"Changes","values":[["total","Total",1]]},"events":{"name":"events","label":"Events","values":[["total","Tot  Notice: /Stage[main]/Main/Rustup[zac]/Rustup_internal[zac]/ensure: created
al",1],["failure","Failure",0],["success","Success",1]]}},"resource_statuses":{"Rustup_internal[zac]":{"title":"zac","file":"/Users/zac/Desktop/Projects/Personal/zbox/.modules/rustup/manifests/init.pp","line":81,"resource":"Rustup_internal[zac]","resource_type":"Rustup_internal","provider_used":"default","containment_path":["Stage[main]","Main","Rustup[zac]","Rustup_internal[zac]"],"evaluation_time":13.223539,"tags":["rustup_internal","zac","rustup","class"],"time":"2024-02-08T10:57:32.839099000-05:00","failed":false,"failed_to_restart":false,"changed":true,"out_of_sync":true,"skipped":false,"change_count":1,"out_of_sync_count":1,"events":[{"audited":false,"property":"ensure","previous_value":"absent","desired_value":"present","historical_value":null,"message":"created","name":"rustup_internal_created","status":"success","time":"2024-02-08T10:57:32.839219000-05:00","redacted":null,"corrective_change":false}],"corrective_change":false}},"corrective_change":false,"catalog_uuid":"1d5863e1-6e89-4578-80ff-3f9caa22af36","cached_catalog_status":"not_used"},"_output":"changed: 1, failed: 0, unchanged: 0 skipped: 0, noop: 0","_sensitive":"Sensitive [value redacted]"}}]
  changed: 1, failed: 0, unchanged: 0 skipped: 0, noop: 0
Finished: apply catalog with 0 failures in 26.74 sec
Finished: apply catalog with 0 failures in 26.74 sec
Successful on 1 target: localhost
Ran on 1 target in 39.67 sec

This is easier to reproduce without Puppet entirely. If I brew install rustup, I get an ARM rustup-init binary:

∴ file $(which rustup-init)
/opt/homebrew/bin/rustup-init: Mach-O 64-bit executable arm6

If I run rustup-init from an ARM shell (default answers), once it's done I have an ARM ~/.cargo/bin/cargo:

zac@atropos ~ ∴ which cargo
/Users/zac/.cargo/bin/cargo
zac@atropos ~ ∴ file $(which cargo)
/Users/zac/.cargo/bin/cargo: Mach-O 64-bit executable arm64

But if I run it via this module in an x86 bolt, things malfunction.

zbentley commented 8 months ago

The linked PR does seem to work, and always installs the requisite base/bootstrap toolchain as the native architecture.

danielparks commented 8 months ago

Great, thanks! Merged, with only a bit of struggling with CI.

I imagine that fixes the problem with the wrappers being the wrong arch, but I would not have expected this to fix the issue where ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/* are x86_64 instead of ARM. Can you confirm that?

zbentley commented 8 months ago

I imagine that fixes the problem with the wrappers being the wrong arch, but I would not have expected this to fix the issue where ~/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/* are x86_64 instead of ARM. Can you confirm that?

Sorry, I could have been clearer in the earlier comment: the issue was only ever with ~/.cargo/bin/cargo; the architectures in ~/.rustup/toolchains were always the correct ones for their respective toolchains. If that should change, I can apply a similar fix to their bootstrap code, just let me know!

danielparks commented 8 months ago

Ah, okay. That makes a lot more sense.

danielparks commented 8 months ago

I’ve just cut a release that includes this. Do you think anything else needs to be done before closing this?

Thanks again for report and the PR!

zbentley commented 8 months ago

Just tested the new release, and my test succeeded on my laptop, but failed on a blank MacOS VM.

That's entirely my fault. My facter environment downcases the OS family for .... frankly silly reasons (which routinely cause me issues) that I should undo. The code in the PR I submitted is a no-op on all Macs except mine.

The fix is simple and will be linked right after this comment; upcase the "d" in "Darwin".

So sorry, I should have tested on a fresh install before I sent that in and made you spend the time.

Please don't feel pressured to cut a second release any time soon; this is a fairly rare issue and I'm happy to pin to master until whenever's convenient for you.

danielparks commented 8 months ago

Ha ha, no problem. I should have caught that in review.

zbentley commented 8 months ago

Tested on main just now and this is confirmed fix. Sorry for all the back-n-forth, and thanks for authoring and maintaining this module!

danielparks commented 8 months ago

Ok, I cut a new release with an additional fix for macOS (in rustup::global).