Open dankessler opened 2 years ago
Thanks! Looks good. But why these characters and in that order? Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.
Though the GS, RS and US codes were designed to send tabular data https://stackoverflow.com/a/18782271/453735
How about this protocol GS RS head US payload RS GS
? This way we could add more records in the future if needed.
Could you also better document the protocol and include the links to relevant info?
https://stackoverflow.com/a/18782271/453735 https://www.sciencebuddies.org/science-fair-projects/references/ascii-table
Have you tried the ones proposed by Lionel 001 - start of header, 002 - start of text, 003 - end of text? It would semantically map to the MPI protocol.
I hadn't tried because I was nervous about the shell being confused by \003
specifically, since it's what is sent by C-c
to interrupt a process. However, it was lazy of me to not test, so I went ahead and tried it and it seems to work fine (tested with M-x R
and M-x shell
using both bash
and zsh
on a MacBook Pro).
So, I propose we go with the protocol suggested by @lionel- , specifically SOH head STX payload ETX
On the other hand, we could go further down the road proposed by @vspinu and consider the more extensible protocol of GS RS head US payload RS GS
, although I don't know if there's value in being able to pack multiple messages into a single group that doesn't have its own header as opposed to just sending them atomically (although perhaps the overhead is different if there are many, many messages to send). My hesitation with this protocol (and the reason I had done something semantically weird to do something like ETX
without actually using it) was concern about using the same delimiter for both start and end and a fear that the handler might get confused and bad things could happen if it thinks an end is a start and vice versa, but maybe that anxiety is misplaced and/or can be addressed.
In my opinion, the SOH head STX payload ETX
approach seems clearer (and can of course be revisited down the road if MPI
starts seeing a lot of use), but I'm happy to do whatever is preferred.
Could you also better document the protocol and include the links to relevant info? https://stackoverflow.com/a/18782271/453735 https://www.sciencebuddies.org/science-fair-projects/references/ascii-table
Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages
be a good place to put this documentation, or are you thinking something more formal like the manual?
Also, while I'm at it, I'm planning to use octal codes in the elisp too where I can, just so that people looking at the code on e.g., GitHub are less confused by the non-printing characters.
Sure; once we settle on delimiters and a protocol I can do that. Would the docstring for ess-mpi-handle-messages be a good place to put this documentation, or are you thinking something more formal like the manual?
Just a note that I think all of this should be treated as internal for now, so the manual would not be a good place. Probably comments would be fine for this doc. (For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)
(For the same reason I'd prefer things like the mpi regex variables to be double-slash prefixed.)
Do you mean just for display purposes, or would you like the regex to actually include a literal slash? I.e., do you mean like this?
(defvar ess-mpi-message-start-delimiter "\001")
(defvar ess-mpi-message-field-separator "\002")
(defvar ess-mpi-message-end-delimiter "\003")
or like this?
(defvar ess-mpi-message-start-delimiter "\\\001")
(defvar ess-mpi-message-field-separator "\\\002")
(defvar ess-mpi-message-end-delimiter "\\\003")
Sorry, I meant double-dashed, e.g. ess--mpi-message-start-delimiter
.
Ah, that makes more sense and I can make that change while I'm touching this code anyway (I presume the convention is that the ess--
prefix indicates things that are "internal").
I've pushed some new commits that
SOH header STX payload ETX
protocoless--mpi-handle-messages
I think this addresses all of the outstanding issues raised in the discussion of this PR and is perhaps good to go.
Feel free to go ahead and accept if this looks good on your end, but let me know if you prefer that I squash this all into a single commit or rebase to tidy things up and force push to update the PR.
I am still not very sure about the protocol. In the future we might want to extend it with a protocol version number, length of the message and maybe checksum. This is why fields separators might work better.
Also I cannot find what is the actual use case for SOH, STX, ETX on terminal emulators. Here they say
"ETX End of Text Often used as a "break" character (Ctrl-C) to interrupt or terminate a program or process."
So seems dangerous. Didn't you say that comint sends \003 on C-c?
The tabular data delimiters seem to be more appropriate for the task at hand:
" Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. "
I just realized we never closed the loop on this. How does this protocol (which @vspinu proposed earlier) sound?
GS RS head US payload RS GS
This could later be extended to include multiple key/value pairs like
GS RS head1 US payload1 RS head2 US payload2 RS GS
So long as we stick to the convention that there's a RS
before the first pair and after the last pair, then the start/end delimiters are distinct (GS RS
and RS GS
, respectively), which addresses my fear of the delimiters getting unbalanced.
I've updated the PR to implement this.
It would be interesting to have this merged (or at least a partial pull request with the ess-tracebug loading). The loading of ess-tracebug would give a consistent experience across ess. See, for example #1265
As discussed in #1163 and #1142, there are some issues with
R
sessions invoked viaess-remote
as discussed in the manual (i.e., launch a shell within emacs usingM-x shell
, ssh to a server, runR
, and then doM-x ess-remote
to get things setup).The most pressing issue is that some literal control codes in files that are injected during configuration can get intercepted by the shell which screws up subsequent parsing and can have weird consequences. A simple fix for this is to use octal codes rather than literals in the associated R files, and this is all that 2ce1846488544fe9afe6ce24c502185fab6ddff0 does. However, that alone is not sufficient to get the associated functionality working correctly, but if the rest of this PR requires further discussion and a quick fix is desired, I'm happy to submit a PR that includes* only this small commit.
The subsequent commits in this PR are aimed at improving MPI support for
R
sessions started withess-remote
as discussed above. I enumerate the changes/commits below, since in general they are all severable.In d9443009347caaa779420986191dd1a830db4893, I switch to different control codes and in particular use those associated with separators to reduce the chances of a shell ascribing them special meaning. Note: I use literals in the
elisp
code so in Github it renders like an empty string, but it's easier to see in emacs.In a5451413b939aebd1e629ca189510b7b27e9d2d8, change the message handler to override the
read-only
property which prevents it from cleaning up after itself when it runs in a*shell*
buffer.In f5f54f27238d0058b5346cbe2b76751030ee45a1, I (conditionally) invoke
ess-tracebug
when setting things up in a manner analogous to that done byR-initialize-on-start
(whichess-remote
doesn't call for what I assume are good reasons unknown to me). This is necessary to set off a chain of activity which ultimately results iness-mpi-handle-message
getting run automatically inR
sessions started withM-x ess-remote
in the same way that it does inR
sessions started withM-x R
.