emacs-ess / ESS

Emacs Speaks Statistics: ESS
https://ess.r-project.org/
GNU General Public License v3.0
626 stars 161 forks source link

Improve support for `R` sessions started with `ess-remote` #1182

Open dankessler opened 2 years ago

dankessler commented 2 years ago

As discussed in #1163 and #1142, there are some issues with R sessions invoked via ess-remote as discussed in the manual (i.e., launch a shell within emacs using M-x shell, ssh to a server, run R, and then do M-x ess-remote to get things setup).

The most pressing issue is that some literal control codes in files that are injected during configuration can get intercepted by the shell which screws up subsequent parsing and can have weird consequences. A simple fix for this is to use octal codes rather than literals in the associated R files, and this is all that 2ce1846488544fe9afe6ce24c502185fab6ddff0 does. However, that alone is not sufficient to get the associated functionality working correctly, but if the rest of this PR requires further discussion and a quick fix is desired, I'm happy to submit a PR that includes* only this small commit.

The subsequent commits in this PR are aimed at improving MPI support for R sessions started with ess-remote as discussed above. I enumerate the changes/commits below, since in general they are all severable.

vspinu commented 2 years ago

Thanks! Looks good. But why these characters and in that order? Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.

Though the GS, RS and US codes were designed to send tabular data https://stackoverflow.com/a/18782271/453735

How about this protocol GS RS head US payload RS GS? This way we could add more records in the future if needed.

Could you also better document the protocol and include the links to relevant info?

https://stackoverflow.com/a/18782271/453735 https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

dankessler commented 2 years ago

Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.

I hadn't tried because I was nervous about the shell being confused by \003 specifically, since it's what is sent by C-c to interrupt a process. However, it was lazy of me to not test, so I went ahead and tried it and it seems to work fine (tested with M-x R and M-x shell using both bash and zsh on a MacBook Pro).

So, I propose we go with the protocol suggested by @lionel- , specifically SOH head STX payload ETX

On the other hand, we could go further down the road proposed by @vspinu and consider the more extensible protocol of GS RS head US payload RS GS, although I don't know if there's value in being able to pack multiple messages into a single group that doesn't have its own header as opposed to just sending them atomically (although perhaps the overhead is different if there are many, many messages to send). My hesitation with this protocol (and the reason I had done something semantically weird to do something like ETX without actually using it) was concern about using the same delimiter for both start and end and a fear that the handler might get confused and bad things could happen if it thinks an end is a start and vice versa, but maybe that anxiety is misplaced and/or can be addressed.

In my opinion, the SOH head STX payload ETX approach seems clearer (and can of course be revisited down the road if MPI starts seeing a lot of use), but I'm happy to do whatever is preferred.

Could you also better document the protocol and include the links to relevant info? https://stackoverflow.com/a/18782271/453735 https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages be a good place to put this documentation, or are you thinking something more formal like the manual?

Also, while I'm at it, I'm planning to use octal codes in the elisp too where I can, just so that people looking at the code on e.g., GitHub are less confused by the non-printing characters.

lionel- commented 2 years ago

Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages be a good place to put this documentation, or are you thinking something more formal like the manual?

Just a note that I think all of this should be treated as internal for now, so the manual would not be a good place. Probably comments would be fine for this doc. (For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)

dankessler commented 2 years ago

(For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)

Do you mean just for display purposes, or would you like the regex to actually include a literal slash? I.e., do you mean like this?

(defvar ess-mpi-message-start-delimiter "\001")
(defvar ess-mpi-message-field-separator "\002")
(defvar ess-mpi-message-end-delimiter "\003")

or like this?

(defvar ess-mpi-message-start-delimiter "\\\001")
(defvar ess-mpi-message-field-separator "\\\002")
(defvar ess-mpi-message-end-delimiter "\\\003")
lionel- commented 2 years ago

Sorry, I meant double-dashed, e.g. ess--mpi-message-start-delimiter.

dankessler commented 2 years ago

Ah, that makes more sense and I can make that change while I'm touching this code anyway (I presume the convention is that the ess-- prefix indicates things that are "internal").

dankessler commented 2 years ago

I've pushed some new commits that

  1. switch to the SOH header STX payload ETX protocol
  2. adhere to the double-dash convention (at least for any variables/functions I touched; I did not comprehensively refactor)
  3. update/add documentation in elisp comments and docstring for ess--mpi-handle-messages

I think this addresses all of the outstanding issues raised in the discussion of this PR and is perhaps good to go.

Feel free to go ahead and accept if this looks good on your end, but let me know if you prefer that I squash this all into a single commit or rebase to tidy things up and force push to update the PR.

vspinu commented 2 years ago

I am still not very sure about the protocol. In the future we might want to extend it with a protocol version number, length of the message and maybe checksum. This is why fields separators might work better.

Also I cannot find what is the actual use case for SOH, STX, ETX on terminal emulators. Here they say

"ETX End of Text Often used as a "break" character (Ctrl-C) to interrupt or terminate a program or process."

So seems dangerous. Didn't you say that comint sends \003 on C-c?

The tabular data delimiters seem to be more appropriate for the task at hand:

" Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. "

dankessler commented 1 year ago

I just realized we never closed the loop on this. How does this protocol (which @vspinu proposed earlier) sound?

GS RS head US payload RS GS

This could later be extended to include multiple key/value pairs like

GS RS head1 US payload1 RS head2 US payload2 RS GS

So long as we stick to the convention that there's a RS before the first pair and after the last pair, then the start/end delimiters are distinct (GS RS and RS GS, respectively), which addresses my fear of the delimiters getting unbalanced.

I've updated the PR to implement this.

aramirezreyes commented 1 year ago

It would be interesting to have this merged (or at least a partial pull request with the ess-tracebug loading). The loading of ess-tracebug would give a consistent experience across ess. See, for example #1265