emacs-eldev / eldev

Elisp development tool
https://emacs-eldev.github.io/eldev/
GNU General Public License v3.0
226 stars 17 forks source link

Escape -- with double quotes (powershell compatibility) #88

Closed ikappaki closed 1 year ago

ikappaki commented 1 year ago

It appears double dashes are special in PowerShell and need quoting, see https://stackoverflow.com/questions/15780174/powershell-command-line-parameters-and.

Patch fixes recent MS-windows CI run of eldev doctor break with https://github.com/doublep/eldev/commit/ab5c7a9612a5bc541cdc1414aa94707c9490d9cc

ikappaki commented 1 year ago

Hi @doublep,

it appears that the user ids do not match in the doctor's file-owners test on MS-Windows

Do you know what part of the ci run first creates the .cache/eldev directory so I can trace it back? Running the doctor job by itself complains the .cache/eldev directory is not there, so I can't debug by just running that job alone.

[00:00.119] Are Eldev cache files owned by the current user? NO [00:00.120]
File 28.2/' in Eldev cache directory c:/Users/runneradmin/AppData/Roaming/.cache/eldev' appears to be owned by a “wrong” user/group: 544/None' (expected isrunneradmin/None'). There also appears to be 3 more such files.

         This is likely a consequence of erroneous Eldev behavior when using Docker:
         hopefully all bugs that would cause this have been fixed by now, but previous
         invocations might have lead to existence of such files.  Unfortunately, you
         have to delete them manually, otherwise Eldev might misbehave.

[00:00.120] Finished erroneously on Fri Jun 2 22:17:03 2023 Error: Process completed with exit code 1.

I've added some extra logging in the corresonding doc test, and it does show the ids are different, 500 below is runneradmin, and no idea where the 544 is coming from (these are the .cache/eldev dir/files attributes).

[00:00.145] :EU 500 [00:00.145] :attributes (t 1 544 513 (25723 5594 0 0) (25723 5594 0 0) (25723 5594 0 0) 0 "drwxrwxrwx" t 13792273859633670 1746663697) [00:00.146] :attributes (t 1 544 513 (25723 5607 0 0) (25723 5607 0 0) (25723 5607 0 0) 0 "drwxrwxrwx" t 1125899907654154 1746663697) [00:00.146] :attributes (t 1 544 513 (25723 5615 0 0) (25723 5615 0 0) (25723 5378 0 0) 4096 "drwxrwxrwx" t 562949953421873 780035056) [00:00.146] :attributes (nil 1 544 513 (25723 5378 0 0) (25723 5378 0 0) (25723 5378 0 0) 0 "-rw-rw-rw-" t 281474976711218 780035056)

update: it appears the .cache dir has been created by the BUILTIN/Administrators user (rather than runneradmin)

=> get-acl c:/Users/runneradmin/AppData/Roaming/.cache

Path Owner Access


.cache BUILTIN\Administrators NT AUTHORITY\SYSTEM Allow FullControl…

doublep commented 1 year ago

In general, is it impossible to make Eldev on Windows process -- just as it does on Linux? I.e. if you type that command (eldev doctor -- -githooks -recent-stable-releases) does it work (regardless of results for now) on your local machine, or does it also fail with that "unknown option" error? Or is it a question of how GitHub workflows execute commands on Windows and is not relevant to local machines?

it appears that the user ids do not match in the doctor's file-owners test on MS-Windows

Yeah, it was an experimental addition that turned out to work just fine on Linux and macOS. I have no idea why it is different on Windows, but of course would be grateful if someone could investigate this.

Do you know what part of the ci run first creates the .cache/eldev directory so I can trace it back? Running the doctor job by itself complains the .cache/eldev directory is not there, so I can't debug by just running that job alone.

No, sorry. I'd presume that one of integration or Docker tests. Also, just some hints about debugging failures in GitHub CI (though I guess you already do something similar). When I face such issues and need to investigate them, I create a dummy branch, delete all (or all except one, leaving that for comparison) irrelevant entries from testing matrix and also delete all irrelevant steps and limit tests to one or a few that are needed to reproduce the issue. This makes CI runs much faster and somewhat bearable.

I've added some extra logging in the corresonding doc test, and it does show the ids are different, 500 below is runneradmin, and no idea where the 544 is coming from (these are the .cache/eldev dir/files attributes).

Sorry, zero idea what happens GitHub + Windows. Can you try and reproduce this locally somehow?

ikappaki commented 1 year ago

In general, is it impossible to make Eldev on Windows process -- just as it does on Linux? I.e. if you type that command (eldev doctor -- -githooks -recent-stable-releases) does it work (regardless of results for now) on your local machine, or does it also fail with that "unknown option" error? Or is it a question of how GitHub workflows execute commands on Windows and is not relevant to local machines?

There are two "shell"s available in MS-Windows, one is the legacy cmd prompt which is very much still used by everyone, and there is the newer one called PowerShell. Like bash, cmd prompt and PowerShell reads the command line and decides what to do with it. It so happens PowerShell treats -- specially, and if we want to pass it down to forked command (in our case eldev), we need to escape it with double quotes, and thus this patch.

GH defaults to the PowerShell on windows, thus we are getting this issue. We can switch the shell to cmd prompt, which does not treat -- as a special sequence, but this will complicate the GH workflow since w need to have a different job for windows and another for the rest.

it appears that the user ids do not match in the doctor's file-owners test on MS-Windows

Yeah, it was an experimental addition that turned out to work just fine on Linux and macOS. I have no idea why it is different on Windows, but of course would be grateful if someone could investigate this.

I've added some extra logging in the corresonding doc test, and it does show the ids are different, 500 below is runneradmin, and no idea where the 544 is coming from (these are the .cache/eldev dir/files attributes).

Sorry, zero idea what happens GitHub + Windows. Can you try and reproduce this locally somehow?

This issue is not reproducible locally, but it always manifests itself on GH CI. I believe this is because GH CI runs with the administrator account.

It seems to me the issue is that the Emacs user-uid returns a value of 500, while the cache file-attrirbutes returns a user id of 544 as a file owner and thus the error. I trust the latter as being correct.

Looking at how user-uid determines its uid, Emacs sets it at start up with the c fn init_user_info

https://github.com/emacs-mirror/emacs/blob/6847c01568ef09764ee603fda5e20505453e1e10/src/w32.c#L2247

and in particular in the dflt_passwd.pw_uid field. The below suspiciously can hardcode the user uid to 500 under certain conditions, which is the number appearing in the failed doctor test. So, could it be that the user-uid should not be trusted when the Emacs is run with an administrator account on MS-Windows, as it is the GH CI case?

https://github.com/emacs-mirror/emacs/blob/6847c01568ef09764ee603fda5e20505453e1e10/src/w32.c#L2291

      if (xstrcasecmp ("administrator", uname) == 0)
    {
      dflt_passwd.pw_uid = 500; /* well-known Administrator uid */
      dflt_passwd.pw_gid = 513; /* well-known None gid */
    }
      else
    {

It might be difficult to investigate this on GitHub, because it will require building/debugging Emacs on GH CI. The other option is to setup an administrator account locally, but I'm not fond of this at all because it might mess up with my local PC in ways I can't envisage.

doublep commented 1 year ago

There are two "shell"s available in MS-Windows, one is the legacy cmd prompt which is very much still used by everyone, and there is the newer one called PowerShell [...]

GH defaults to the PowerShell on windows, thus we are getting this issue.

Thanks for the detailed answer. In this case your patch is fine, just please add a comment why we need the quotes. No-one except maybe you will remember in a year.

It seems to me the issue is that the Emacs user-uid returns a value of 500, while the cache file-attrirbutes returns a user id of 544 as a file owner and thus the error. I trust the latter as being correct.

Looking at how user-uid determines its uid, Emacs sets it at start up with the c fn init_user_info

Yeah, might be a problem in Emacs then, ~as always~. I doubt it would matter, but can you please test if user-real-uid makes any difference (I don't know what "real" UID is, and documentation doesn't seem to explain anything; just noticed that there is such a function)? If not, we can just call it a day and simply disable this check on GitHub CI + Windows with some appropriate comment. Would be nice if you could do this, as you have better understanding of what's going on.

ikappaki commented 1 year ago

I looked into this a little bit deeper. It appears the built in administrator user id is indeed 500, and it is the owner of the group with id 544, as per the highlighted parts below taken by running whoami /all on the GH job.

image

Perhaps the directlory/files created by the runneradmin user with uid 500 are owned by the 544 group because uid 500 is the "Group owner"? nevertheless, it appears MS-Windows works like that and there doesn't appear to be much we can do with the built in administrator account and its files ownership to work around this.

Some more information just for reference about these two sids taken from https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/understand-security-identifiers

S-1-5-domain-500 | Administrator | A user account for the system administrator. Every computer has a local Administrator account and every domain has a domain Administrator account.The Administrator account is the first account created during operating system installation. The account can't be deleted, disabled, or locked out, but it can be renamed.By default, the Administrator account is a member of the Administrators group, and it can't be removed from that group. -- | -- | -- S-1-5-32-544 | Administrators | A built-in group. After the initial installation of the operating system, the only member of the group is the Administrator account. When a computer joins a domain, the Domain Admins group is added to the Administrators group. When a server becomes a domain controller, the Enterprise Admins group also is added to the Administrators group. -- | -- | --

Looking at how user-uid determines its uid, Emacs sets it at start up with the c fn init_user_info

Yeah, might be a problem in Emacs then, ~as always~. I doubt it would matter, but can you please test if user-real-uid makes any difference (I don't know what "real" UID is, and documentation doesn't seem to explain anything; just noticed that there is such a function)? If not, we can just call it a day and simply disable this check on GitHub CI + Windows with some appropriate comment. Would be nice if you could do this, as you have better understanding of what's going on.

user-real-uid didn't make much difference, it appears the issue here is that we compare the uid with the gid in the doctor test in the case of the administrator account, so these two are going to be different.

I'll see to disabling this particular doctor test on GH CI for MS-Windows, not sure how much time we'd like to spent on trying to fix this otherwise.

Thanks

doublep commented 1 year ago

Perhaps the directlory/files created by the runneradmin user with uid 500 are owned by the 544 group because uid 500 is the "Group owner"?

Maybe if that's the case, Eldev's doctest is just unnecessarily strict. It is basically targetted at earlier Eldev bugs (though maybe they are still present) where Eldev executed in a Docker container would erroneously create root-owned files in the global or project cache and later "normal" Eldev invocation wouldn't be able to change those files from under your nomal account. I.e. the expected behavior is that if you run under account of johndoe, then both eldev test ... and eldev docker 27 test ... create johndoe-owned files and can thus reuse files created by each other when needed.

I'll see to disabling this particular doctor test on GH CI for MS-Windows, not sure how much time we'd like to spent on trying to fix this otherwise.

The easiest would be to just add if: "!startsWith (matrix.os, 'windows')" to the step definition, see earlier steps in test.yml. But if you have time to find a "more correct" solution that doesn't just skip everything, that would be even better. Anyway, any change fixing this is appreciated, thank you!

ikappaki commented 1 year ago

I'll see to disabling this particular doctor test on GH CI for MS-Windows, not sure how much time we'd like to spent on trying to fix this otherwise.

The easiest would be to just add if: "!startsWith (matrix.os, 'windows')" to the step definition, see earlier steps in test.yml. But if you have time to find a "more correct" solution that doesn't just skip everything, that would be even better. Anyway, any change fixing this is appreciated, thank you!

I can hack it around to disable the test on windows only by using eldev emacs --eval and updating eldev-doctor-disabled-tests as such, though it doesn't help with readibility (the constraint here placed by PowerShell is that he eval string has to be enclosed in " and not contain any embedded ", thus the use of prin1-to-string):

    - name: Doctor the project
      run: |
        # on MS-Windows, disable the eldev-file-onwers doctor test on GH CI, see
        # discussion in https://github.com/doublep/eldev/pull/88.
        ./bin/eldev emacs --batch --eval "(when (eq system-type 'windows-nt) (write-region (prin1-to-string '(push 'eldev-file-owners eldev-doctor-disabled-tests)) nil (prin1-to-string 'Eldev) 'append))"
        # Run `doctor' on the project itself.  Git hooks are not going
        # to be installed in this checkout.  Also, don't insist on
        # recent stable releases here.
        ./bin/eldev -p -dtTC doctor "--" -githooks -recent-stable-releases
      env:
        ELDEV_LOCAL: "."

I've tested this to work across all system. What do you think?

ubuntu/osx: image

windows image

(please note there is a slight bug with the current doctor reporting logic that even though a test has been disabled, the results show 6 tests as being run, as highlighted, instead of 5)

doublep commented 1 year ago

./bin/eldev emacs --batch --eval "(when (eq system-type 'windows-nt) (write-region (prin1-to-string '(push 'eldev-file-owners eldev-doctor-disabled-tests)) nil (prin1-to-string 'Eldev) 'append))"

Uh, this looks ugly and, if I understand correctly, modifies file Eldev which is then "leaked" to all following steps. A better way would be to write Eldev-local, but much better would be just sth. like (untested):

eldev -S "(when (eq system-type 'windows-nt) (push 'eldev-file-owners eldev-doctor-disabled-tests))" doctor ...

It's better to avoid "permanent" (even for one CI run) effects when they are not really needed.

there is a slight bug with the current doctor reporting logic that even though a test has been disabled, the results show 6 tests as being run, as highlighted, instead of 5

Indeed, I will look into this.

ikappaki commented 1 year ago

./bin/eldev emacs --batch --eval "(when (eq system-type 'windows-nt) (write-region (prin1-to-string '(push 'eldev-file-owners eldev-doctor-disabled-tests)) nil (prin1-to-string 'Eldev) 'append))"

Uh, this looks ugly and, if I understand correctly, modifies file Eldev which is then "leaked" to all following steps. A better way would be to write Eldev-local, but much better would be just sth. like (untested):

eldev -S "(when (eq system-type 'windows-nt) (push 'eldev-file-owners eldev-doctor-disabled-tests))" doctor ...

It's better to avoid "permanent" (even for one CI run) effects when they are not really needed.

Indeed a much better option, I wasn't aware of -S. I have tested it to work fine and added some notes in the job comments with the latest commit.

doublep commented 1 year ago

Thank you!