emacs-eldev / eldev

Elisp development tool
https://emacs-eldev.github.io/eldev/
GNU General Public License v3.0
229 stars 17 forks source link

Native support for MS-Windows #35

Closed ikappaki closed 3 years ago

ikappaki commented 3 years ago

Hi,

could you please review draft patch to support running eldev on MS-Windows natively.

Basically it introduces the bin/eldev.bat and install.bat files while making the necessary updates to sources files for the test suite to pass.

I have not tested it any further past the test suite, though I know of at least one other issue with caching package repository HTTP requests which is easy to fix. I wanted to get some feedback first as to confirm I am on the right path and the work makes some sense.

The changes are as follows:

  1. The /eldev-util.el:eldev-bat-quote/ fn is introduced to support the quoting of bin/bootstrap.el.part into the new bin/eldev.bat file, which is modeled exactly after the bin/eldev file, minus the EMACS_TTY variable (not sure if this is relevant under windows?). It is a wonder the level of hackery one has to master to just quote a multiline string in batch files. The fn is called three times as appropriate in the new template bin/eldev.bat.in file which is picked up by the bin/eldev.bat build -f command to generate bin/eldev.bat via the existing Eldev's preprocess-.in builder.
    1. The batch file returns the exit code of the Emacs process. This first came up while running the /test/basics.el:eldev-command-hook-1/ test.
  2. A (require 'cl-lib) has been introduced at the top-level of bin/boostrap.el.part. There appears to be a bug in Emacs on windows that `require'ing the 'cl-lib package inside a `let' and then trying to `setf' any cl struct slot results to a fatal error. In our case the 'cl-lib package is implicitly `require'd by 'package and results in a Symbol's function definition is void: \(setf\ package-desc-dir\) error. I have not look into the cause of this issue yet, but will raise it with Emacs dev if necessary. This issue first came up while running the /test/archives.el:eldev-archives-1/ test.
  3. A new /eldev.el:eldev-script-name/ variable has been introduced to specify the name of the eldev script (eldev by default or eldev.bat on windows ). Naked eldev references in the code have been replaced with this variable. This first came up while running the /test/install-self.el:eldev-install-self-1/ test.
  4. The /eldev.el:eldev-shell-command/ fn has been updated on windows to display the .bat version of eldev if necessary, replacing forward with backward slashes. This first came up while running the /test/helpe.el:eldev-help-1/ test. There is now mixed reporting of forward/backward slashes on windows. Not sure whether we would like to enforce one style over the other? If so I suppose this would be the windows native back slash. Example output with mixed slashes:
    Ran Eldev as `d:/src/eldev/bin/eldev help' in directory `d:/src/eldev/test/empty-project/' 
    Stdout contents:
    Usage: d:\src\eldev\bin\eldev.bat [OPTION...] COMMAND [...]
  5. The /eldev.el:eldev_builder_package/ builder has been updated to copy project files to a temporary directory rather than create a symbolic link to it, which is not supported on windows to begin with. I felt this to be a better solution even outside of windows because before temporary files were created in the original project directory due to symbolic linking, and thus I completely removed the symbolic functionality. It uses the undocumented `dired-copy-file-recursive' fn from the 'dired-aux package with sensible params to avoid asking confirmation from the user when copying to the empty temp directory. Since the descriptor-file is now created in the temp directory, there is no reason to explicitly delete it (it is removed altogether when deleting the temp directory). This first came up while running the /test/basics.el:eldev-bootstrapping-1/ test.
  6. The install.bat file is introduced, modeled after install.sh.
  7. The /test/package.el:eldev-package-3/ test has been updated to be skipped when makeinfo is not installed, as it is not readily available on windows.

Running the test suite on Windows 10 with emacs 27.1 from the command prompt produces the following results:

Ran 326 tests, 323 results as expected, 0 unexpected, 3 skipped (2021-03-12 22:00:11+0000, 603.840259 sec)

3 skipped results:
  SKIPPED  eldev-init-hg-1
  SKIPPED  eldev-init-svn-1
  SKIPPED  eldev-package-3

The tar tool is not preinstalled in earlier version of windows, so it is likely we need to mark more tests for skipping when this is not available.

Thank you

ikappaki commented 3 years ago

Reverted change#2 above, since I can't reproduce anymore, must have been due to a misconfiguration in my initial setup.

emacs 27.1 on windows 10:

Ran 326 tests, 323 results as expected, 0 unexpected, 3 skipped (2021-03-13 11:24:09+0000, 631.051384 sec)
3 skipped results:
  SKIPPED  eldev-init-hg-1
  SKIPPED  eldev-init-svn-1
  SKIPPED  eldev-package-3

emacs 26.3 on WSL ubuntu 20.4:

Ran 326 tests, 324 results as expected, 2 skipped (2021-03-13 11:36:27+0000)

2 skipped results:
  SKIPPED  eldev-init-hg-1
  SKIPPED  eldev-init-svn-1
doublep commented 3 years ago

Yes, in general I'd be interested. Can you maybe coordinate somehow with @juergenhoetzel, author of the unfinished PR #26, which also tries to add support for Windows, though with a PowerShell script rather than a .bat file?

A few general notes (I haven't looked into all details yet):

minus the EMACS_TTY

It controls whether coloring with ANSI escape code is enabled by default. I have no idea how this works on Windows (if at all), so it's up to you.

A (require 'cl-lib) has been introduced at the top-level of bin/boostrap.el.part.

Don't see this in the patch. I would like to avoid it, but if you say this is needed, so be it. Apparently it already implicitly included anyway: $ eldev eval "(featurep 'cl-lib)" prints t.

A new /eldev.el:eldev-script-name/ variable has been introduced to specify the name of the eldev script

Is it not possible to launch an executable by its name (i.e. without .bat) in Windows? If so, maybe the script could just pass its name through some new variable (like ELDEV_TTY is used) and also make it customizable by the user for who-knows-what-cases-it-might-be-needed? And if it is empty, fall back to OS-specific logic as now.

I felt this to be a better solution even outside of windows because before temporary files were created in the original project directory due to symbolic linking, and thus I completely removed the symbolic functionality.

Please explain, I don't understand. Why not replace existing FIXME with actually copying? Symlinking (where it is supported) is faster, and this might be important when using local dependencies in "packaged" mode, since then packages get rebuilt implicitly and potentially frequently enough.

ikappaki commented 3 years ago

Yes, in general I'd be interested. Can you maybe coordinate somehow with @juergenhoetzel, author of the unfinished PR #26, which also tries to add support for Windows, though with a PowerShell script rather than a .bat file?

Sure, I've contacted Mr. Hötzel on the back of that PR.

* Please don't use tabs for indenting, only spaces. Otherwise the patch is difficult to read.

Yes, apologies, I've missed this in my excitement (perhaps we should enforce this in .dir-locals.el?)

* Squash all the commits, easier to read when all changes are combined, since they are about the same thing here.

Original PR had one commit, and then there was a follow up as per comment#2, and this is commit#3. I believe you missed the second commit and thought it was part of the original PR? If not however, I can definitely keep squashing as I go with this PR so that it only ever shows one commit (my plan was to squash everything before the merge, which I consider it to be the standard practice).

* Don't use `cl-*` function; `cl-case` is probably essentially the same as `pcase` or `eldev-pcase-exhaustive`.

Done.

minus the EMACS_TTY

It controls whether coloring with ANSI escape code is enabled by default. I have no idea how this works on Windows (if at all), so it's up to you.

I'll put down on my list as something to look at once everything else is in place.

A (require 'cl-lib) has been introduced at the top-level of bin/boostrap.el.part.

Don't see this in the patch. I would like to avoid it, but if you say this is needed, so be it. Apparently it already implicitly included anyway: $ eldev eval "(featurep 'cl-lib)" prints t.

I've reverted this change as per comment#2 since I could not reproduce it while I was having a closer look at the issue.

A new /eldev.el:eldev-script-name/ variable has been introduced to specify the name of the eldev script

Is it not possible to launch an executable by its name (i.e. without .bat) in Windows? If so, maybe the script could just pass its name through some new variable (like ELDEV_TTY is used) and also make it customizable by the user for who-knows-what-cases-it-might-be-needed? And if it is empty, fall back to OS-specific logic as now.

Not absolutely sure what the suggestion is here, could you please elaborate. The batch file can be called as either eldev or eldev.bat from the command line and inside the script we can record its physical name with the extension e.g. eldev.bat (with %~nx0) or without e.g. eldev (with %n0).

I suppose we could then introduce a new variable e.g. ELDEV_SHELL_NAME, and set it accordingly in both bin/eldev and bin/eldev.bat as follows bin/eldev:

ELDEV_SHELL_NAME=`basename "$0"`  # but also cater for symbolic links

bin/eldev.bat:

set ELDEV_SHELL_NAME=%~nx0

and then use that value as a replacement for naked "eldev" references in the code, such as the following in /Eldev:eldev-install-self/ command, which ultimately decides which script to install:

(copy-file "bin/eldev" (file-name-as-directory directory))

I felt this to be a better solution even outside of windows because before temporary files were created in the original project directory due to symbolic linking, and thus I completely removed the symbolic functionality.

Please explain, I don't understand. Why not replace existing FIXME with actually copying? Symlinking (where it is supported) is faster, and this might be important when using local dependencies in "packaged" mode, since then packages get rebuilt implicitly and potentially frequently enough.

Sorry, I misunderstood the original code. So the intention was all along to use copying as a backup mechanism in case symbolic linking failed. I've reverted back to that construct and installed `copy-directory' on file-error.

The reason I find copying in general better is because build artefacts are created in the temp directory and there is no chance for them to end up persisting in the project directory, either because something has failed halfway through or as a result of a complex case which is difficult to diagnose. If by chance something temporal persists in the project directory, then it might interfere with normal operation, since it shouldn't haven't been there in the first place.

Another reason which I think is better is that it allows for multiple build operations to run concurrently in the project dir (even the same one running multiple times) without the fear an operation might interfere with another. Using symbolic links means that build artefacts are stored, albeit temporarily, in the project's directory thus they are susceptible to persist in it or they might interfere with concurrent operations.

Nevertheless, it appears good care has been taken in the code to remove transient artefacts explicitly, tests are never ran concurrently but sequentially and we do care about the degrading performance copying would incur.

Thanks for the feedback!

ikappaki commented 3 years ago

Hi @doublep,

further to the previous commit#3, I've replaced all windows-nt specific code around .bat files in commit#4 with the following:

  1. bin/eldev.bat sets new ELDEV_SSN env var to eldev.bat (SSN is short for Shell Script Name).
  2. eldev.el, introduces the following two vars which replaced all references to "eldev" and "bin/eldev" respectively in the code:
    1. `eldev-shell-script-name': defaults to "eldev", overwritten by ELDEV_SSN when set (i.e. "eldev.bat" when invoked from bat script).
    2. `eldev-bin/eldev-path': relative path of eldev shell script in /bin (i.e. "bin/eldev" or "bin/eldev.bin" on windows)
  3. eldev.el:`eldev-shell-command' always converts the returned path to forward slashes when FOR-DISPLAY is t using standard emacs fns (reminder, this is required for passing test/help.el:`eldev-help-1').

Let me know what you think, I find this a cleaner approach but not sure if you had something else in mind.

In addition, we have agreed with @juergenhoetzel that we use the .bat version for bootstrapping and his excellent PowerShell scripts for webinstall.

I've also done some research and it appears ANSI color escape sequences are supported on the command prompt on later versions of Windows 10, so we should be able to accommodate them (though they currently appear verbatim on standard output and make the tests that compare standard output with strings, such as test/archives.el:`eldev-archives-4', to fail colorfully :))

Thanks

doublep commented 3 years ago

Sorry, I'm a bit busy lately, so this might take a while. Especially because of this please squash, I forget what I have decided about the previous commits and will review everything together anyway.

Not absolutely sure what the suggestion is here, could you please elaborate. The batch file can be called as either eldev or eldev.bat from the command line and inside the script we can record its physical name with the extension e.g. eldev.bat (with %~nx0) or without e.g. eldev (with %n0).

Why does Eldev need to know it is a .bat script? Can it not just tell the OS (Windows) "run something called eldev, supplying extension if you need"? Then we would avoid the need for ELDEV_SSN altogether.

eldev.el:eldev-shell-command' always converts the returned path to forward slashes when FOR-DISPLAY is t using standard emacs fns (reminder, this is required for passing test/help.el:eldev-help-1').

I think it would be better to use native slash type, i.e. backslashes for Windows. You can adjust the test instead, they are not to be considered fixed in stone. Look for some other tests that do regexp matching against output, so you can use sth. like [/\\] there.

I've also done some research and it appears ANSI color escape sequences are supported on the command prompt on later versions of Windows 10, so we should be able to accommodate them (though they currently appear verbatim on standard output and make the tests that compare standard output with strings, such as test/archives.el:`eldev-archives-4', to fail colorfully :))

If you are having problems with this, you can postpone it till better time, it is not a must. Better to have a foolproof default to "no color" then making mistakes.

What's up with macOS tests breakage?

Can you activate CI testing for Windows, AFAIR GitHub provides some machines for this (I guess as for macOS it would be enough to test latest stable and snapshot)? If yes, can you also add some tests for webinstall on Windows?

ikappaki commented 3 years ago

Why does Eldev need to know it is a .bat script? Can it not just tell the OS (Windows) "run something called eldev, supplying extension if you need"? Then we would avoid the need for ELDEV_SSN altogether.

Windows and Emacs are clever enough to invoke eldev.bat even without supplying the extension. It is Eldev:`eldev-install-self' which needs to be explicit which file to install, either eldev.bat on windows or eldev othewise.

I have removed ELDEV_SSN as suggested, and patched that function instead with os specific code. I have also introduced the test/common.el:`eldev--test-drop-usage-bat-extension' fn, to remove the .bat extension from the usage text produced in stdout in so many of the tests, so that the text comparison succeeds. This approach requires more interventions than SSN. Let me know if this is better, happy to try more ideas that you might have.

I've also done some research and it appears ANSI color escape sequences are supported on the command prompt on later versions of Windows 10, so we should be able to accommodate them (though they currently appear verbatim on standard output and make the tests that compare standard output with strings, such as test/archives.el:`eldev-archives-4', to fail colorfully :))

If you are having problems with this, you can postpone it till better time, it is not a must. Better to have a foolproof default to "no color" then making mistakes.

I think it should only be a couple of extra lines in the bat files to detect if the terminal is capable of color output (and this is the difficult part, trying to do anything in bat :)), but will postpone if I'll find other issues on the way there.

What's up with macOS tests breakage?

I believe you might be referring to the snapshot failures on both macOS and Ubuntu? It appears to be caused by this change on the latest Emacs master branch having to do with lazy initialisation of variables, which causes test/emacs.el:`eldev-emacs-project-isolation-2' test to break (even on eldev master). I haven't had the chance to look into it yet, since it is unrelated to this patch, but will do at some point.

9973019764250ac1f4d77a6b426cdd9c241151c5
Author:     Stefan Monnier <monnier@iro.umontreal.ca>
AuthorDate: Tue Jan 5 12:28:37 2021 -0500
Commit:     Stefan Monnier <monnier@iro.umontreal.ca>
CommitDate: Tue Jan 5 12:28:37 2021 -0500

Parent:     1433a12014 ruby-mode: eliminate redundant regexp branch
Containing: master
Follows:    emacs-27.1 (4184)

* lisp/emacs-lisp/package.el: Load package-quickstart without package.el

Speed up startup when `package-quickstart` is in use by making it possible
to load the quickstart file without having to load `package.el` at all.
...

Can you activate CI testing for Windows, AFAIR GitHub provides some machines for this (I guess as for macOS it would be enough to test latest stable and snapshot)? If yes, can you also add some tests for webinstall on Windows?

@juergenhoetzel is making great progress on the windows CI, I believe we should be having something ready soon, I'll leave it to him to create the necessary tests for webinstall on Windows given it is his work :)

Thanks!

doublep commented 3 years ago

Let me know if this is better, happy to try more ideas that you might have.

Yeah, sorry for double work, but I originally wrote: Is it not possible to launch an executable by its name (i.e. without .bat) in Windows? If so, maybe the script could just pass its name through some new variable. So, it was only a suggestion if .bat file cannot be run otherwise.

I believe you might be referring to the snapshot failures on both macOS and Ubuntu?

Yes, I mixed things up. Apparently the breakage is caused by package-user-dir becoming an autoloadable variable, so its value is computed (from somewhere) before Eldev's customization of user-emacs-directory is evaluated. Can be seen from this:

$ emacs --batch --q --eval '(setf user-emacs-directory "/tmp/")' --eval "(progn (print (boundp 'package-user-dir)) (require 'package) (print package-user-dir))"

nil

"/tmp/elpa"
$ emacs-28 --batch --q --eval '(setf user-emacs-directory "/tmp/")' --eval "(progn (print (boundp 'package-user-dir)) (require 'package) (print package-user-dir))"

t

"~/.emacs.d/elpa"

Guess the easiest and most robust fix is to just pass package-user-dir and its value to setf too. I hate discussing things with Emacs-devel.

juergenhoetzel is making great progress on the windows CI, I believe we should be having something ready soon

Good. But this is practically a must for this PR. I don't even have access to Windows myself, so cannot test anything without CI.

I'll review the changes now.

ikappaki commented 3 years ago

I have made further changes based on the review. Not sure what's wrong with the CI failures, can't make sense of them, I suspect CI needs a restart and the errors should go away? wishful thinking :)

juergenhoetzel is making great progress on the windows CI, I believe we should be having something ready soon

Good. But this is practically a must for this PR. I don't even have access to Windows myself, so cannot test anything without CI.

I shall look into it. Do I need to keep updating this PR until I get it right? It doesn't seem to allow me to run the existing CI action on a different branch in my forked repo, so that I don't add unnecessary noise in this thread.

doublep commented 3 years ago

Not sure what's wrong with the CI failures, can't make sense of them, I suspect CI needs a restart and the errors should go away? wishful thinking :)

The failures look weird, probably something with unsynchronized changes at the archive server. Need to eventually build some sort of autoretrier into Eldev, not the first time I have seen such crap. Restarted the job manually for now.

I shall look into it. Do I need to keep updating this PR until I get it right? It doesn't seem to allow me to run the existing CI action on a different branch in my forked repo, so that I don't add unnecessary noise in this thread.

Sorry, I don't know much about how GitHub works. If you don't see a better way, continue working here. If Jurgen finds this too noisy, he can always unsubscribe.

doublep commented 3 years ago

By the way, I fixed snapshot breakages in trunk (two commits). Rebase this PR, then CI will pass, at least untill you activate testing on Windows ;) Or unless that crap with GNU ELPA and signatures surfaces again.

ikappaki commented 3 years ago

By the way, I fixed snapshot breakages in trunk (two commits). Rebase this PR, then CI will pass, at least untill you activate testing on Windows ;) Or unless that crap with GNU ELPA and signatures surfaces again.

Great! All CI tests passes at the moment ... even on Windows! @juergenhoetzel has almost completed the work on the windows CI , and should have something ready for review really soon.

I had to make a further update in test/common.el to fix the test/init.el:eldev-init-svn-1 test, which was failing because the subversion file://<repos> url requires an additional / root node before the drive letter (fix included with the latest commit).

ikappaki commented 3 years ago

Hi,

I've just committed a change to the .bat file to add support for ANSI color sequences. Basically it checks whether the script is directly attached to the console, and whether is running on a version of windows that supports ANSI control sequences (>= 10.0.18363, aka Windows 10 version 1909 - Nov 2019 Update) and sets the ELDEV_TTY variable accordingly. I believe we are now feature complete, as far as the bat file is concerned.

(I've also changed the indentation of the first ELDEV_EMACS block as to match the dominant style).

Thanks

ikappaki commented 3 years ago

Added an auxiliary testwindows-emacs-latest workflow action job to invoke the test suite with the latest windows and Emacs versions, so as to capture the successful output of the test suite running on windows. This is only temporarily included here in this PR for demo purposes, to be replaced by the complete workflow action update being worked at in #26.

doublep commented 3 years ago

Added an auxiliary testwindows-emacs-latest workflow action job to invoke the test suite with the latest windows and Emacs versions, so as to capture the successful output of the test suite running on windows. This is only temporarily included here in this PR for demo purposes, to be replaced by the complete workflow action update being worked at in #26.

OK, so we are waiting until installation PR is complete? I see that it works (with the exception of the test that requires makeinfo, but OK, that's not terribly important). Please also activate integration tests. Also, how different would the final version be? I'd prefer as much similarity with other operating systems as possible.

ikappaki commented 3 years ago

OK, so we are waiting until installation PR is complete? I see that it works (with the exception of the test that requires makeinfo, but OK, that's not terribly important). Please also activate integration tests. Also, how different would the final version be? I'd prefer as much similarity with other operating systems as possible

Yes, currently waiting for #26 PR to be completed. @juergenhoetzel as per this WIP commit is refactoring the github test action script to use the most common denominator code across the different architectures. So the appearance looks like is going to change a little bit, but the original structure seems to be maintained.

Integration tests as in CI? The windows tests are already running in github as part of this PR, or they appear to be doing so for me at least.

(texinfo is a bit of a difficult nut to crack. It is not available from the MSYS2 MINGW packages distribution channel I'm most familiar with, nor it appears to be available from SCOOP which Jürgen is using. I'll have a look if I could hack it in from somewhere else)

Thanks!

juergenhoetzel commented 3 years ago

Hi @ikappaki

Would you mind to squash the commits that address the same code/issue? Currently there are 4 (partially redundant?) commits on this branch. This makes collaberation, rebasing and review more difficult:

git shortlog -n master..feature/windows-support
ikappaki (4):
      Native support for MS-Windows
      Native support for MS-Windows
      Native support for MS-Windows
      Native support for MS-Windows
doublep commented 3 years ago

Integration tests as in CI? The windows tests are already running in github as part of this PR, or they appear to be doing so for me at least.

Yeah, sorry, didn't notice. It's because in the normal CI they are split into their own step.

texinfo is a bit of a difficult nut to crack.

It's not required. I also have to install it additionally for Linux and macOS, but if you cannot achieve the same for Windows, it's not that big of a deal.

ikappaki commented 3 years ago

Would you mind to squash the commits that address the same code/issue? Currently there are 4 (partially redundant?) commits on this branch. This makes collaberation, rebasing and review more difficult:

Squashed everything into one commit.

ikappaki commented 3 years ago

OK, I can now see what Paul meant by integration tests earlier, and how I mislead him to believe they were indeed running by the example workflow action job I've introduced, while in fact they are not. The --test-type integration although is mentioned in the action, it was never ran, because the cmd shell was exiting after the first test invocation due to Sequence of instructions in cmd shell doesn’t work and never reached the next line.

Thus, I completely missed the integration tests, and the error Jürgen is seeing is coming from in there.

@doublep could you please review the latest change

  1. Update eldev.el:eldev--global-cache-url-retrieve-synchronously fn to store package replies as 'binary data. It was converting newlines from unix to dos format otherwise, and that had a side effect to treat the replies as empty data while literally loading them from the cache later on. This fixes the test/integration/global-cache.el:eldev-global-cache-1 break.
  2. call eldev.bat in the workflow action, instead of invoking it directly (so as to give integration tests a chance to run).

(@juergenhoetzel, I will squash the commit once reviewed with success)

Thanks both

doublep commented 3 years ago

Looks fine.

doublep commented 3 years ago

BTW, about that insanity with call for .bat files. Will it affect the final testing workflow? I'd rather avoid the need to split commands depending on the OS (one set for non-Windows, one for Windows) if at all possible.

ikappaki commented 3 years ago

BTW, about that insanity with call for .bat files. Will it affect the final testing workflow? I'd rather avoid the need to split commands depending on the OS (one set for non-Windows, one for Windows) if at all possible

Jürgen is using Powershell to evolve the action, which is a different type of shell that doesn't suffer from the above. So, with how things roughly look like at the moment there does not appear to be a split, at least in that part of the code.

Thanks for the quick review! (I've also squashed the commit)

doublep commented 3 years ago

GitHub doesn't understand it, but this PR has effectively been merged via #36: that PR includes the commit here. So, the real status is "Merged", even if shown otherwise. Thank you for your work.