emacs-ess / ESS

Emacs Speaks Statistics: ESS
https://ess.r-project.org/
GNU General Public License v3.0
622 stars 162 forks source link

ess-remote causes various `unexpected symbols` errors in R evaluations. #1163

Open bryce-carson opened 3 years ago

bryce-carson commented 3 years ago

I use ess-remote in my daily R work on a cluster. I need to allocate resources for interactive jobs, so I can't use other methods of interacting with remote R processes, so:

  1. M-x shell and indicate /bin/bash as the shell type.
  2. I load an environment module so that I have R, and then I call R.
  3. M-x ess-remote and I select R as the process type, and then the following errors are produced.

The previous workspace is not the culprit of the issue, as I have tried R --vanilla before and I get the same behaviour. Usually I simply call M-x ess-remote and select R until I can do so and none of the errors are returned in the *shell* process buffer.

[Previously saved workspace restored]

> options(pager='cat')
options(pager='cat')
> 
> ## Hacked help.start() to use wit
> .ess_help_start <
+     home <- if (is.nul
+                 port <- 
Error: unexpected symbol in:
"    home <- if (is.nul
                port"
>                 if (port > 
+                     if (up
+                         make.packages.html(temp=TRUE)
Error: unexpected symbol in:
"                    if (up
                        make.packages.html"
>    
>         
>                 else stop(".ess_help_start() requires the HTTP server to be runni
Error: unexpected 'else' in "                else"
>  
> 
>     paste0(home, "/doc/html/index.html")
Error in paste0(home, "/doc/html/index.html") : object 'home' not found
> }
Error: unexpected '}' in "}"
> 
> 
> 
> .ess.rdired <- f
Error: object 'f' not found
>     cat('\n')

> 
>     objs <- 
+     if (length(ob
+         cat("No object
Error: unexpected symbol in:
"    if (length(ob
        cat"
>         return(invisible
+     }
Error: unexpected '}' in:
"        return(invisible
    }"
> 
>     names(objs) <- objs
Error: object 'objs' not found
>     objs <- lapply(objs, get, envir = .GlobalEnv)
Error in lapply(objs, get, envir = .GlobalEnv) : object 'objs' not found
> 
>     mode <- sapply(objs, data.class)
Error in lapply(X = X, FUN = FUN, ...) : object 'objs' not found
>     length <- sapply(objs, l
+  
+ 
+  
+     var.names <- row.names(d)
Error: unexpected symbol in:
" 
    var.names"
>     
>     ## If 
>     quotes <- rep
>     spaces <- grep(' ', var
+     if (any(spaces))
Error: unexpected 'if' in:
"    spaces <- grep(' ', var
    if"
>  
>     var.names <- paste(q
+     row.names(d) <- paste('  ', var.names, sep = '')
Error: unexpected symbol in:
"    var.names <- paste(q
    row.names"
> 
>     print(d)
Error in print(d) : object 'd' not found
> }
Error: unexpected '}' in "}"
> }
Error: unexpected '}' in "}"
> }
Error: unexpected '}' in "}"
> 
> .ess_mpi_y_or_n <- function(prompt, callback = NULL){
+     .ess_mpi_send("y-or-n", prompt, callback)
+ }
> 
> .ess_mpi_eval <- function(expr, callback = NULL){
+     .ess_mpi_send("eval", expr, callback)
+ }
> 
> .ess_mpi_error <- function(msg) {
+     .ess_mpi_send("error", msg)
+ }
> 
> }
Error: unexpected '}' in "}"
> options(STERM='iESS', str.dendrogram.last="'", editor='emacsclient', show.error.locations=TRUE)
options(STERM='iESS', str.dendrogram.last="'", editor='emacsclient', show.error.locations=TRUE)
lionel- commented 3 years ago

Do you have the very last dev version of ESS installed?

There are characters missing in the R code sent to the remote. I recall having fixed a similar issue last year.

bryce-carson commented 3 years ago

Yep, I have the HEAD version, b1d299c96e73dc892b0065c1617205dd4b9f8ab8.

lionel- commented 3 years ago

I can't reproduce. Could you please step through ess-r-load-ESSR with edebug and find out when does the code get truncated?

vspinu commented 3 years ago

The problem is very subtle. Big chunks of code cannot be reliably sent through shell stdin. Can you try setting ess-r-fetch-ESSR-on-remotes to t? This will download ESSR (R code which ESS depends upon) from github to your remote machine.

bryce-carson commented 3 years ago

The problem is very subtle. Big chunks of code cannot be reliably sent through shell stdin. Can you try setting ess-r-fetch-ESSR-on-remotes to t? This will download ESSR (R code which ESS depends upon) from github to your remote machine.

Things seem much better with this variable set to true.

After loading ESSR into the remote, my minibuffer tends to remain and I never really reach "Done". I have to hit C-g after a bit, or resize my window to get to my prompt, where ess-remote was issued successfully on my Mac. This doesn't happen on my Fedora machine.

Another way I get around the minibuffer "hanging" is by ensuring I am in evil-insert-mode and calling ess-remote with M-x, rather than getting to M-x by double-tapping SPC and selecting ess-remote outside of evil-insert-mode.

dankessler commented 2 years ago

I have been trying to use the "Shell > ssh to remote > R > ess-remote" workflow and running into similar issues. In my case, I'm trying to interact with a high performance computing system, so I need to get my environment setup using the module system (or do something hacky with my .profile and TRAMP, but that's tacky and delicate).

I think this might be (in part) related to the control characters in mpi.R, specifically this line. It's not easy to see in the browser, but there are two literal escapes and also a C-^ in the middle of the string. This seems to not be an issue when starting R with e.g., M-x R RET, since all of the ESSR files are sourced.

Here's what I think is going on with ess-remote and how this is related. If one starts a shell, launches R within it, and then calls ess-remote, then the files are not sourced but rather "injected" using ess--inject-code-from-file. Because everything goes via the shell on its way to R, the ESC character is now special. If using bash (or many other shells), the ESC key causes the next character to be treated as if it was chorded with Meta (or just consumes the subsequent character if it has no special meaning). The line in question looks like

    cat(sprintf("_%s%s\\", head, payload))

The characters rendered as question marks are, from left to right, ESC, C-^, and ESC again. The first ESC pairs with the subsequent to become `M-, which in bash meansyank_last_arg(which could have all kinds of strange consequences). The final ESC pairs with the subsequent` to become M-\. However, this sequence doesn't mean anything special to bash, so it is just consumed. As a result, the latter portion of the string becomes %s\". Since there's only one \ left, it escapes the closing quote and leaves the string unterminated. This presumably screws something up, perhaps the unterminated string goes on consuming subsequent lines until it hits the opening quote of a later line, and then everything is out-of-whack and presumably it encounters a } at some point whose opening { was consumed in one of the strings, or perhaps it falls down in some even more exotic way due to chunking. Either way, I think the root of the problem is that the shell eats the <ESC>\ and screws up string termination. It probably also breaks MPI, because this prevents that function from getting defined (and even if it was somehow defined, it would be improperly defined to not emit the appropriate control characters that the MPI handler is looking for).

These control characters were discussed on the commit that initially added MPI support and were changed a few years later.

I don't have any bright ideas of how to fix this. Having the remote fetch the files itself and then source them seems like one workaround. Alternatively, perhaps ess--inject-code-from-file can be modified so that it can get control characters to R without them being consumed by the shell on the way, but that seems tricky to do robustly. However, it seems like it's really just ESC (and maybe C-^, although that seems ok for now), and really just in this one line of the mpi.R file, so maybe something hacky would be good enough.

As a side note on the Mac vs Fedora/Linux issues: this might be down to differences between zsh and bash (or whatever shell your Fedora machine uses). When I tried to use ess-remote from my MacBook, everything would just lock up and I'd have to C-g repeatedly to get it to resume, whereas in Linux things would remain responsive but I'd get the mismatched } error. I eventually realized that my Mac's shell uses zsh whereas my Linux machine uses bash: forcing the Mac to use bash gave similar results to the Linux machine, but I just wanted to leave a note here for any Mac users who are also fighting with ess-remote.

vspinu commented 2 years ago

MPI is not used ATM. Your problem is unrelated and is known for ages. You just cannot send big chunks of text through shell on remotes. This is precisely why we split the files into small chunks in ess--inject-code-from-file and hope for the best.

Our best guess so far was that this happens due to EOF characters which emacs sends for long input, but I am not sure. It could be that TRAMP is doing some shenanigans under the hood. It's likely low lever and I never got enough time to track this down. If you could track this down it would be an insane improvement to ESS.

dankessler commented 2 years ago

Thanks for the quick response and a huge thank you for all you and the other contributors do to make ESS such a valuable part of my everyday work :)

I know there are broader issues with sending big chunks of text through the shell on remotes, and I'd be delighted if my insight here was somehow applicable to fixing that. However, in this case I'm pretty confident that this specific error about the unexpected } is due to escape characters in the mpi.R file being consumed when they are sent by injection.

Here are steps to reproduce the error and below that is a straightforward solution that I'm happy to PR if the approach seems useful.

Reproduce

  1. Start shell: M-x shell RET
  2. SSH to a computer that has R installed and available: ssh someserver
  3. Run R in the shell: R RET
  4. Invoke ess-remote: M-x ess-remote RET, and observe this output
    options(pager='cat')
    > Error: unexpected '}' in "}"
    > > + + > > + + > > + + > > Error: unexpected '}' in "}"
    > options(STERM='iESS', str.dendrogram.last="'", editor='emacsclient', show.error.locations=TRUE)
  5. Check and see that .ess_mpi_message is not defined
    
    > .ess_mpi_message("Hello!")
    Error in .ess_mpi_message("Hello!") : 
    could not find function ".ess_mpi_message"

# Fix
1. Open file `mpi.R`
2. Edit line 10 to use octal codes for control characters rather than literals
cat(sprintf("\033_%s\036%s=\033\\", head, payload))
3. Repeat steps 1-3 from "Reproduce" above in a fresh shell
4. Invoke `ess-remote`: `M-x ess-remote RET`, and observe this error-free output

options(pager='cat') options(STERM='iESS', str.dendrogram.last="'", editor='emacsclient', show.error.locations=TRUE)

5. Verify that `.ess_mpi_message` is now available and (kind of) works

.ess_mpimessage("Hello") messageHello> \

I had initially tried fixing this by inserting literal C-v's before each of the literal ESC characters. This also fixed things, but unfortunately broke MPI when the buffer was started with R (since then there were extra control codes that screwed up parsing), so I think the octal (or analogous hex or whatever) approach is more elegant (and also easier to read in GitHub anyway).

Getting MPI to work in *shell*

Getting .ess_mpi_message to work properly in the environment created by ess-remote is another can of worms, but if mpi is not even used right now, maybe it's not a big problem. In case we wanted to fix it, here are some of the obstacles and ideas for solutions

  1. When R tries to print an ESC character into the *shell* buffer, it is consumed by bash (and I can't figure out a workaround to get R to successfully print ESC). This could be fixed by using different control codes as start and end delimiters, like \034 or \035 instead of ESC (\036) and then updating the elisp variables ess-mpi-message-start-delimiter and ess-mpi-message-end-delimiter accordingly.
  2. Unlike in a buffer started with M-x R, in a buffer that was setup with ess-remote it doesn't seem like ess-mpi-handle-messages gets run automatically/regularly, although I can invoke it manually with M-: (ess-mpi-handle-messages (current-buffer)) RET
  3. The output in the shell buffer has the front-sticky text property (read-only), so manually invoking ess-mpi-handle-messages as described above moves point to the right location, but then spits out a warning about the text being read-only. This can also probably be worked around with the trick discussed in the elisp manual: binding inhibit-read-only to non-nil, removing the read-only text property, and then doing the usual thing of removing the message and handing it off to the appropriate handler.
lionel- commented 2 years ago

This could be fixed by using different control codes as start and end delimiters

Good idea, we could try with these:

<SOH>: Start of Header byte (0x01)
<STX>: Start of Text byte (0x02)
<ETX>: End of Text byte (0x03)
vspinu commented 2 years ago

in the mpi.R file being consumed when they are sent by injection.

I see now. Thanks for clarifying.

we could try with these:

I might have tried back then. Let's revisit indeed.

malcook commented 2 years ago

Hi @vspinu - I am experiencing the issue as first reported by @bryce-carson and took your advice, but now when I

try setting ess-r-fetch-ESSR-on-remotes to t? This will download ESSR (R code which ESS depends upon) from github to your remote machine.

upon running ess-remote I get the error:

> options(pager='cat')
> > + + + + + + + + + + + + + + + + + + + + + trying URL 'https://github.com/emacs-ess/ESS/raw/ESSRvnil/etc/ESSR.rds'
<simpleError in utils::download.file(url, essr_file): cannot open URL 'https://github.com/emacs-ess/ESS/raw/ESSRvnil/etc/ESSR.rds'>
[1] FALSE
Warning message:
In utils::download.file(url, essr_file) :
  cannot open URL 'https://github.com/emacs-ess/ESS/raw/ESSRvnil/etc/ESSR.rds': HTTP status was '404 Not Found'

The github URL appears borked, lacking correctly interpolated version number.

FWIW:

I am running GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw3d scroll bars) of 2022-01-19

ess-version reports

ess-version:  [elpa: 20220127.1454] (loaded from /home/mec/.emacs.d/elpa/ess-20220127.1454/)
vspinu commented 2 years ago

@malcook Looks like we are having problems with essr-version on MELPA. Somehow it's not set during byte compilation. I have added a workaround for now but this has to be fixed. @lionel- @jabranham Any ideas why this code is not working during byte compilation on paciakge instalation?

malcook commented 2 years ago

why this code is not working during byte compilation on paciakge instalation

presumably because my emacs is compiled --with-native-compilation

Here is a fellow traveler with a solution: https://github.com/clojure-emacs/sayid/pull/59/files

another possibility might be appealing to internal variable byte-compile-current-file as discussed here or load-true-file-name

malcook commented 2 years ago

a workaround for now

OK thanks @vspinu - manually setting essr-version allows ess-remote to download source from github to ~/.config/ESSR/ESSRv1.7.rds

but in my hands the original error prevails, viz:

m-x shell
ssh some-remote-host
R
...
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
m-x ess-remote
> options(pager='cat')
> > + + + + + + + + + + + + + + + + + + + + + trying URL 'https://github.com/emacs-ess/ESS/raw/ESSRv1.7/etc/ESSR.rds'
Content type 'application/octet-stream' length 15018 bytes (14 KB)
==================================================
downloaded 14 KB

[1] TRUE
> Error: unexpected '}' in "}"
> > + Error: unexpected assignment in:
".ess.strip.error <- function(
    pattern <-"
> + Error: unexpected '}' in:
"    sub(p
}"
...
vspinu commented 2 years ago

I don't have native emacs but I do see the problem with melpa versions. The problem does not occur if I compile manually. So it's really something special happening during package instalations.

I have added load-true-file-name trick. Let's see if it works.

malcook commented 2 years ago

Cool. Thanks. I'll try it once gnu.org is back up.... after about how long should I expect to see your change percolate through to melpa?

dankessler commented 2 years ago

You can see the status in the box at right on https://melpa.org; here's what it looks like for me right now

image

so I'd guess in a little under an hour from now

malcook commented 2 years ago

Thanks for the education @dankessler

Alas @vspinu, testing the new version of ess from melpa finds that it does not fix the issue: essr-version is still nil after updating from melpa and loading your new version of ess in a fresh emacs session. However, if I manually re-byte-compile ess.el I find it now works in a fresh emacs and sets essr-version to "1.7". Not sure if I'm playing whack-a-mole or on a goose chase, but I'm sorry for letting this last goose loose.

I'm happy to test any other proposed workarounds....

malcook commented 2 years ago

FWIW: I find calling ess-remote a second time succeeds when the first call fails as initially reported.

plpxsk commented 2 years ago

The fix from @dankessler works for me 💯 Thanks to the entire team for working on implementing it in the associated PR.

I tested the fix across 2 of our different HPCs, and in both cases, ESS remote finally seems to work as expected.

dankessler commented 2 years ago

The fix from @dankessler works for me 💯 Thanks to the entire team for working on implementing it in the associated PR.

I tested the fix across 2 of our different HPCs, and in both cases, ESS remote finally seems to work as expected.

I'm so glad to hear that! This is a useful reminder that I need to close the loop and finish the discussion with the maintainers to get the PR adopted.

IllustratedMan-code commented 1 year ago

DId this ever get fixed?

dankessler commented 1 year ago

DId this ever get fixed?

I don't think so; my PR #1182 is still open. Looking at the comments, I think we went down a rabbit hole of trying to define a cleaner protocol, and then at some point I got distracted and never got back to @vspinu .

I assume we can come to a conclusion RE the protocol, so I'll try to get that PR moving again.

malcook commented 6 months ago

FWIW: I concur that the Fix provided by @dankessler in https://github.com/emacs-ess/ESS/issues/1163#issuecomment-1024967733 fully addresses the problem exactly as reported in their Reproduce section.

Without the Fix, I experience the problem as reported using latest:

In summary, the Fix redefines the R function .ess_mpi_send, removing the literal escape codes which bullox up the chunked communication strategy used used when (ess-r--load-ESSR-remote t).

I agree the "problem is very subtle"; if you can not reproduce it using the provided steps , try directly eval-ing (ess-r--load-ESSR-remote t) so as to force the chunked communication to be tested.

I think the Fix should probably be incorporated into ESSR/R/mpi.R regardless of other related issues raised in this thread.

malcook commented 2 months ago

My STATUS update on this issue as I am dealing with it follows:

I am still experiencing

> Error: unexpected '}' in "}"

upon calling

m-x ess-remote

I am using

GNU Emacs 31.0.50 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw3d scroll bars) of 2024-09-04

with

ess-version:  [elpa: 20240821.145259] (loaded from /home/mec/.emacs.d/elpa/ess-20240821.145259/)

In my case, my ess library is in my home: /home/mec/.emacs.d/elpa/ess-20240821.145259

and my home is on network (NFS) file-system which is ALWAYS also shared to my remote host on which I run R (which I do using slurm on my HPC cluster as mx shell; ssh my_HPC_head_node ; salloc ...; module load R; R )

So, my hack is to comment out this single line in ess-remote...

;(setq-local ess-remote t)

... thus causing ess-r-load-ESSR to call ess-r--load-ESSR-local and source the .R from local file system.

I'm sure this is a TOTAL HACK and the wrong solution but it works for my usage and I bring it up here in hopes of getting the right solution figured out and deployed.

I still concur that @dankessler had part of the fix in hand above...

@vspinu is there a chance of @dankessler 's #1182 commits being taken?