Hard-code base AUR git URI in aurpublish script

eli-schwartz / aurpublish

PKGBUILD management framework for the Arch User Repository

GNU General Public License v2.0

246 stars 18 forks source link

Hard-code base AUR git URI in aurpublish script #23

Closed stevenbenner closed 8 months ago

stevenbenner commented 2 years ago

This patch changes the AUR URI used in git commands from "aur:" to "ssh://aur@aur.archlinux.org". This change removes the requirement that the user have an alias for "aur:" in their ssh config in order for aurpublish to be able connect to the AUR git server.

Testing:

make PREFIX=/usr - make succeeds
aurpublish [package] - verified that pushing a new package works
aurpublish -p [package] - verified that pulling a new package works
aurpublish -p [package] - (repeat command) verified that pulling an existing package works

Related issue: This pull request would resolve issue #18.

Comments:

Hard-coding the URI in the script feels like a good practice to follow, since it makes it explicit what host/protocol the script intends to connect to.
Removing the need for an alias makes this script compatible with the configuration recommended in the AUR submission guidelines topic in the Arch Wiki.
And I personally prefer not to use ssh aliases, because it feels like an indirection.

alerque commented 2 years ago

Hard-coding the URI in the script feels like a good practice to follow, since it makes it explicit what host/protocol the script intends to connect to.

It also makes it harder for the user to connect via other hosts / protocols. The software that runs the AUR is open source and there are instances besides the most popular ones. I also use this for testing, and I'm sure others do as well, not to mention people who may be hosting instances for other distros.

This PR is a step backwards. Perhaps you should consider the ability in addition to the current aur: alias method such as using an environment variable — but at it stands I think hard coding this address actually makes things worse not better.

stevenbenner commented 2 years ago

@alerque, thank you for your feedback. I had not considered your use-case of a self-hosted user repository.

I think that the default value "ssh://aur@aur.archlinux.org" still makes sense, for the reasons I outlined. However, allowing that default to be overridden is a good idea to support scenarios like yours.

I'm wondering what the safest way to introduce a default value would be.

A couple questions for you:

Is there a reason you are suggesting an environment variable instead of an option (e.g. aurpublish [--uri])?
How do you currently switch between your private repo and the AUR when using aurpublish? Do you edit the ssh config as part of your workflow, or something else?

stevenbenner commented 2 years ago

Update: I have added two new changes to preserve compatibility with workflows for self-hosted repositories:

Add -u/--url option for specifying git server

This allows users to override the default git server with a CLI option. For example:

aurpublish --url ssh://me@example.com ansible-core-git
Add check for non-default 'aur' alias in ssh config

And, to protect users who may currently be using the custom ssh hostname trick, this change adds a check for that scenario and warns the user that something unexpected may be happening. This should prevent accidental operations being performed against the official AUR.

I hope these changes address @alerque's concerns and simplify aurpublish for anyone working with the official AUR, without degrading usability for people with more complicated requirements.

alerque commented 2 years ago

First of all, this is using Git remotes—just one default one. One reason not to mess with adding more for different access methods / hosts is dealing with subtrees is already pretty complicated, dealing with subtrees with multiple remotes is a bit of a mind bender.

eli-schwartz commented 9 months ago

Hey all, I'd like to ask why is this not using git remotes for the URL indirection?

I.e. git remote add aur ssh://aur@aur.archlinux.org. Underneath, there's a lot of support for flexible networking: things like proxies, non-ssh protocols & auth mechanisms. Is all of that to be gradually reimplemented here from scratch?

You may of course override arbitrary settings for your .ssh/config for the aur.archlinux.org host, instead of for the aur host. Granted, this does not support overriding User when set via the command line.

I suppose if you need that/other things, you can then configure it with --url, but that also suggests that perhaps people do want to be able to override it via a git config --get --local setting, so... food for thought.

Note as well, the official AUR doesn't support everything that arbitrary git servers do (e.g. it doesn't support non-ssh on its end, and on your end I suppose that would be a proxy).

stevenbenner commented 9 months ago

Update: I have made the following changes, mostly based on code review feedback:

Change default URL to aur@aur.archlinux.org:

Eli's preference.
Allow default AUR URL to be set in git aurpublish.remoteUrl option

Users can configure the default AUR URL in git by setting an option with git config. For example:

git config --global aurpublish.remoteUrl ssh://me@example.com
Remove "aur" alias from suggested ssh host config

Just a bit of cleanup. Since this alias will no longer serve any purpose if this pull request is landed.

Please let me know if anyone has more feedback or suggestions.

eli-schwartz commented 8 months ago

I would prefer to have this squashed into atomic changesets, at least:

commit 4 is a fixup for commit 1, not two distinct changes
commit 5 is thematically the same changeset as commit 2, but "more thorough"

arguably commits 3 and 6 are part of commit 1

stevenbenner commented 8 months ago

Squashing this down into two atomic commits where one is the hard-code change and the other is the addition of a --url option is unfortunately a non-trivial endeavor. The check for a non-default "aur" alias depends on the code for the --url option. So I would need to re-implement that check in a different way, only to immediately discard it in the next commit.

Without developing a throwaway change, the best I can do while preserving multiple topical commits is 3:

pick 0f484ad Hard-code base AUR git URI in aurpublish script
squash 768e04e Change default git server to use scp-like syntax
squash c0c7e01 Remove "aur" alias from suggested ssh host config
pick 1add2f3 Add -u/--url option for specifying git server
squash d48d189 Allow default AUR URL to be set in git `aurpublish.remoteUrl` option
pick 1d39995 Add check for non-default 'aur' alias in ssh config

Or alternatively, I can squash the whole chain into one commit if you would prefer fully atomic.

I went ahead and squashed it down into those 3 commits and updated the commit descriptions to better reflect the changes. Please let me know if you would like me to squash it down into a single commit instead.

ulidtko commented 8 months ago

@stevenbenner @eli-schwartz FWIW, with shell code, I highly recommend plugging shellcheck into your editor. Notice shellcheck — not spellcheck. It's a quality linter, what pylint "should've been" but for shell scripts.

My affiliation: happy user & single-patch contributor (thus far).

It changed my view on the numerous bash pitfalls, including = VS == and [ VS [[.

= has its own issues, which bash actually fixes with ==, but at the cost of going beyond posix. == is not an alias — it's a bash extension ("bashism") that treats several corner-cases differently than =, arguably in less surprising ways.

Anyhow, the linter allows to skip past all that historic minutiae, issues highlights on problematic code which I promptly fix and move on. I rely on the tool to double-check my code, rather than trying to get muscle memory perfect. Just try it, can't recommend enough.

alerque commented 8 months ago

Second that! Also shellharden to automatically fix many of the issues that shellcheck complains about. It ends up doing a few pedantic things that a human can verify are unnecessary in specific cases (e.g. double quoting expansions that are hard coded to a string that doesn't need quoting, but all variables get the same treatment that they theoretically could need quoting). It's output is HIGHLY reliable and even when I personally would have written something a little differently, standardizing my scripting on it's output choices has caught numerous potential bugs before they happened.

ulidtko commented 8 months ago

On aurpublish.in of this PR, shellcheck detects 3 interesting issue types — I'll let you guys try and see for yourself :smiley:

The == and [[ bashisms are not amongst the detections BTW; it's because the shebang explicitly requests bash (-compatible) shell, not a POSIX /bin/sh: https://github.com/eli-schwartz/aurpublish/blob/e318ab6fdecaa95a7f258a5289446949db2a274e/aurpublish.in#L1

You'd only worry about bashisms in scripts with explicitly posix shebang /bin/sh. It's not an issue with /bin/bash.

eli-schwartz commented 8 months ago

= has its own issues, which bash actually fixes with ==, but at the cost of going beyond posix. == is not an alias — it's a bash extension ("bashism") that treats several corner-cases differently than =, arguably in less surprising ways.

Citations please. Your link claims no such thing whatsoever, furthermore, the bash manpage states without any ambiguity:

The = operator is equivalent to ==.

In fact, your link (which I have read many times) talks all about [ -- the classic test command (external disk utility, permitted to be accelerated via a builtin nofork command) -- which does indeed have numerous pitfalls, solved by [[ (a shell keyword, the bash parser can treat keywords as privileged with regard to syntactic awareness)

On aurpublish.in of this PR, shellcheck detects 3 interesting issue types — I'll let you guys try and see for yourself 😃

All of which are incorrect. I'll let you analyze why the things it protests about are not real world concerns.

eli-schwartz commented 8 months ago

Second that! Also shellharden to automatically fix many of the issues that shellcheck complains about. It ends up doing a few pedantic things that a human can verify are unnecessary in specific cases (e.g. double quoting expansions that are hard coded to a string that doesn't need quoting, but all variables get the same treatment that they theoretically could need quoting). It's output is HIGHLY reliable and even when I personally would have written something a little differently, standardizing my scripting on it's output choices has caught numerous potential bugs before they happened.

This sounds like the same logic that led to people using black, the uncompromising python code anti-formatter.

Given that I've been in the position of people offering me "shellcheck fix" patches to projects I maintain, where the so-called fixes changed the program semantics by violating the intended usage and introducing logic bugs, I will not buy this advertisement.

I'm not going to use shellcheck. I have a lot of experience with shellcheck and all of it tells me that it's great for people who simply do not know bash at all, but kind of horrible for everyone else. My stance on the matter is that people should simply learn a programming language if they intend to use it.

Sadly, few people even try to learn bash, everyone loves to use bash because it is "easy" but hates to learn it because "it is terrible" and apparently knowing how to use your tools is a bad thing. It's a kind of weird bash-specific programming blind spot in the general populace.

...

Unfortunately, shellcheck is not a wholesale replacement for knowing your tools. One of the highly legitimate complaints about bash is that POSIX sh (and all shell command languages based on it) is an inconsistent language full of weird quirks and edge cases with bad structure, practically anything you do inevitably tends to turn into metaprogramming due to eval / command substitution, and you cannot properly analyze a bash script without actually running it. Static analyzers are kind of doomed here.

python autofixers can reliably change your python programs without changing the semantic meaning in most cases; bash autofixers cannot.

Expected use of shellcheck includes the use of many linter "ignore" tags, unlike other languages. This, to me, is the most compelling argument against its use. When you become accustomed to seeing shell scripts where half of the script by line count is "shellcheck disable", it gets pretty old.

ulidtko commented 8 months ago

@eli-schwartz sorry, it wasn't my intention to turn this thread into a session of posix-legalese... POSIX test(1) does not specify ==, only =; that [ foo == foo ] works at all, is likely an Undefined Behavior style coincidence, or bug-compatibility artifact, or posix-mode emulation bug-o-feature in bash, or somesuch... The =/== are not really "standalone operators", they only make any sense as an argument to test or [ or [[ (some or all of which may be shell builtins, of course).

Citations please.

Sure, IEEE Std 1003.1-2017 (a.k.a. POSIX) — https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html You can see it specifies:

s1 = s2 True if the strings s1 and s2 are identical; otherwise, false.

... but no requirement for == is given. You see it in bash manpage, sure — because it's a bash-specific extension.

talks all about [ -- ... -- which does indeed have numerous pitfalls, solved by [[

Glad we agree here. My point was: [ VS [[ is a tradeoff (which I wish we wouldn't have) between posix-portability (runs not only in bash) and ease of use (less pitfalls).

On aurpublish.in of this PR, shellcheck detects 3 interesting issue types

All of which are incorrect.

I'll respectfully disagree. This is a classic time-of-check-vs-time-of-use race-condition:

 110   │ elif [[ ! ${toplevel} -ef ${PWD} ]]; then
 111   │     cd "${toplevel}"
 112   │ fi

Subtle and microscopic as it is, the lack of error-exit in cd "${toplevel}" is a bug, highlighted by shellcheck correctly. On line 111, the script can no longer be sure that ${toplevel} still exists and has rx permissions for cd to succeed. And because set -e is not in use — silent swallowing of cd failure may easily lead to unexpected consequences like this famous rm -rf /usr bug — potentially in the current or future version of the script.

I acknowledge, the chances for the bug to materialize in "normal circumstances" are negligible. But so is the time to add || exit 1.

eli-schwartz commented 8 months ago

Sure, IEEE Std 1003.1-2017 (a.k.a. POSIX) — https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html You can see it specifies:

You seem to have misunderstood my request. I was asking for citations for your claim that bash provides different behavior for [[ ... == .. ]] as opposed to [[ ... = ... ]].

It is possible that when you claimed this, it was unintentional and you actually meant to claim that [ is different from [[, in which case my only response is:

Yes. I know that. You're responding to my advice which pointed out the difference... by informing me that there's a difference?

silent swallowing of cd failure may easily lead to unexpected consequences like this famous rm -rf /usr bug — potentially in the current or future version of the script.

I agree that it's a famous bug. How would you recommend shellcheck catch catch and recommend this fix?

@@ -348,7 +348,7 @@ case "$DISTRO" in
   ln -s /usr/lib/mesa/ld.so.conf /etc/alternatives/gl_conf
   rm -rf /etc/alternatives/xorg_extra_modules
   rm -rf /etc/alternatives/xorg_extra_modules-bumblebee
-  rm -rf /usr /lib/nvidia-current/xorg/xorg
+  rm -rf /usr/lib/nvidia-current/xorg/xorg
   ln -s /usr/lib/nvidia-current/xorg /etc/alternatives/xorg_extra_modules-bumblebee
   ldconfig 
  ;;

Please describe how the strategic use of || exit 1 after a cd can remove the necessity of this patch. I eagerly await your explanation.

Note that changing directories strictly upward cannot generally fail unless you have manually deleted the directory you are currently in. Expect additional errors of a... highly specific nature.

It is a moot point since no actions this script takes are destructive, but... when cd'ing directly upwards but never to the side, failure to cd, and remaining in a location directly underneath the place you started out, will... tend... to make any destructive actions be more narrowly scoped than you expected, not less, and hence less dangerous, not more dangerous. Or it would be, if there was anything to destroy.

And because set -e is not in use

Do not ever use set -e except when:

designing micro-languages such as PKGBUILD
Writing a Makefile-like script where a series of unadorned simple commands is run in sequence and succeed in sequence. In such a case, set -e would not merit to be added to the shell command language for this reason alone, but since it exists, you may use it as shorthand for joining each simple-command with &&.

set -e should never have been added to a programming language that, rather than being object-oriented, is returncode-oriented. The shell command language extensively uses the concept of exit 1 as its messaging and signaling device. exit 1 doesn't indicate that an error occurred, it's the standard success return code for commands that perform fact-finding missions and wish to report generic statuses. The test / [ command and [[ keyword are especially notorious for this but merely the tip of the iceberg.

It is not "strict mode".

See https://mywiki.wooledge.org/BashFAQ/105

ulidtko commented 8 months ago

The comment beginning about black makes me feel conflicted... More on that below.

@eli-schwartz yes I strongly agree, programmers must learn the languages and tools they're using. At every opportunity, I'm pushing my colleagues and collaborators to learn to understand what they're doing, better. (And vice versa, willing to receive that.) However, in my experience such "help" is commonly unwanted; all too often, people don't want to learn — and best of all, not always for bad reasons.

In this regard, @stevenbenner has handled the PR feedback very well! Please don't delay review/merge; doing that with this discussion is not my intent. I'm just missing out on genuine technical open discussion, haven't had that in years.

[shellcheck is] great for people who simply do not know bash at all

I find this a too-easy judgement to make seriously :smiling_face_with_tear:

I'd written a great chunk of bash, zsh, posix sh, nowadays even fish. One "masterpiece" I did in bash had GUI and did IPC of sorts between several background processes (coprocesses, in bash terms). I've learned enough of bash to know to not fool myself that I'll get every single line right on the 1st try. Then, perusing shellcheck hints simply saves time, as a quick automatic checklist.

I'm not attacking bash, no way! It's a good time-proven tool, with great power, and sure enough, great trickiness :sweat_smile:

I see no practical issue with ignoring the lints. Doing it explicitly (i.e. # shellcheck disable=SC2086) has the slight benefit of expressing "the author has reviewed this, and it was deemed useless" — which preempts unwanted suggestions like me and @alerque have made here.

Look, there's immense value to the cognitive tool of abstraction. Without abstraction, every coder would have to learn all the way down to assembly instructions, if not to cpu transistor level, right? But viewed another way, abstraction is nothing but willful ignorance: one just assumes a simple model, and chooses to ignore at-the-moment irrelevant details.

Individually, I'm opposing the practice of cargo-culted enforcement of linter compliance (when CI fails unless zero lints). Coding against that feels like programming from jail. Yet, some "people in programmer roles" don't care so egregiously much, that I'd introduced those exact jails myself, to slow down deterioration of codebases.

Likewise, individually I'm opposing the black formatter for python. It shuts down freedom of expression; it drains decent code dry of life and joy. Turns code writing, and reading, into robotic bureaucracy. Yet! When an immature team fails to get along, and spirals into style-nitpicking wars in code-review, and a manager fails to resolve this on human level — introducing mandatory uncompromising black style will be a wrong and incomplete, but a solution.

[...] Static analyzers are kind of doomed here.

Agree; and I'll further add that's true of any (turing-complete) language, even those without eval. It's a mathematical fact from computability theory, see Rice's theorem, its statement is fairly simple. Yet, despite the grim theoretical prognosis in the most general case — restricting oneself to special cases enables having useful analyzers anyway.

Yes. I know that. You're responding to my advice which pointed out the difference... by informing me that there's a difference?

Yes... you're right. I apologize for the confusion and wasted time. Also, sincere gratitude for responding and the discussion.

eli-schwartz commented 8 months ago

No problem. :P

For the record, I have already reviewed and accepted this PR as is, but am waiting until I have a chance to merge it on the command line.

stevenbenner commented 8 months ago

A very interesting discussion. I doubt that I can contribute anything more to the conversation, but I actually do use shellcheck on a couple of my personal projects. However I only invoke it either by explicitly running it from the terminal or as part of a make check step.

I found that shellcheck is kind of all or nothing. If I want any value from it then I need make every line pass, up to including shellcheck directives where I intend for things to go against the rules. If I don't go all the way with it then I have found shellcheck to be very noisy and a bit fragile. Particularly around sourceing and the now-removed zsh support. If one of those fails in shellcheck then it often generates a whole chain of warnings about things that follow. I quickly become blind to the noisy/irrelevant info messages and important things begin to get lost. So my preference has been to explicitly run it when I want a code review. And I add it to make check if I am willing to commit to the shellcheck way for a project and want to force myself to do that as part of the development process.

That being the case, I am not a bash expert -or at least not to the level of you gentlemen- so I do tend to yield to shellcheck's guidance unless I am confident that it is just plain wrong. It can make for a helpful educational aid - forcing me to think about side-effects and edge-cases that I may not have considered. There is also something to be said for the consistency it imposes.

And on the consistency recommendation, shellharden was entirely unknown to me. I'll take a look at that. Though it sounds like an even more furious version of shellcheck.

eli-schwartz commented 8 months ago

Thank you!

stevenbenner commented 8 months ago

Great!

@eli-schwartz: Thank you! Also, you can close issue #18 as resolved.

@alerque and @ulidtko: Thank you for your help along the way and bringing up the considerations that I had missed.