Closed Firobe closed 3 years ago
Just wanted to quickly say thanks! I've looked at this a bit but haven't really got a chance to dive in properly. Hopefully sometime this weekend.
So I've been thinking about this for a bit and I'm leaning towards a somewhat paired-down user-interface. Here's what I'm thinking:
I question whether it's actually that important to be able to collect just stderr and status without stdout. Or just stdout and stderr without status. The main things that I want to make easy are:
And then I think it's less important that it's easy to get the status, just important that it's possible.
So here's the UI I propose, I'm curious what you think:
type everything = { stdout : string ; stderr : string ; status : int }
type 'a what_to_collect
val stdout : string what_to_collect
val stderr : string what_to_collect
val status : string what_to_collect
val stdout_and_stderr : string what_to_collect (* string of the two interspersed *)
val everything : everything what_to_collect
val collect
: ?cwd:string
-> ?env:(string * string) list
-> 'a what_to_collect
-> cmd
-> string
Also I think we should remove the stdout_lines
and stderr_lines
functions and instead expose a "lines" function that just splits on lines. In the end, the call sites would look like:
let files = Feather.process "ls" |> collect stdout |> lines in
I like this since it doesn't complicate the normal use-case and so we're not pressured to special case it, by keeping collect_stdout around for instance.
If you end up wanting the status too, I don't think it's a problem to collect everything and just pull out what you want:
let {stdout ; status ; _ } = Feather.process "ls" |> collect everything in
but it's nice this isn't necessary in the common case.
This would take a GADT under the hood, but above I haven't exposed the variants directly. One nice thing about exposing the variants would be that people could call something 'stdout' without having to worry about shadowing feather's helpers. But I think there's a real value in not using GADT syntax in the interface so people new to OCaml don't get tripped up. I also vaguely prefer collect stdout
to collect Stdout
. An alternative interface could be:
module Collect : sig
val stdout : ?cwd:string -> ?env:(string * string) list -> cmd -> string
val stderr : ?cwd:string -> ?env:(string * string) list -> cmd -> string
val status : ?cwd:string -> ?env:(string * string) list -> cmd -> int
val stdout_and_stderr : ?cwd:string -> ?env:(string * string) list -> cmd -> string
type everything = { stdout : string ; stderr : string ; status : int }
val everything : ?cwd:string -> ?env:(string * string) list -> cmd -> everything
end
with a call site:
let files = Feather.process "ls" |> Collect.stdout |> lines in
This also avoids the collision with Base.stdout
, though I don't mind that terribly.
What do y'all think?
The bugfix for background execution seems awesome! Thanks for all the work in this PR — it'll definitely be a step forward for feather!
Hey, thanks for looking into that!
3. Running a process and grabbing both, interspersed as they arrive - which isn't supported in master
Is it really not supported currently ? It seemed to me redirecting one to the other and then grabbing the other did exactly that. I wonder if adding a second way to do it is a good idea ? Maybe I'm missing something, but to me the natural, Unix way to intersperse both is via redirection.
I question whether it's actually that important to be able to collect just stderr and status without stdout. Or just stdout and stderr without status. The main things that I want to make easy are:
And then I think it's less important that it's easy to get the status, just important that it's possible.
If you end up wanting the status too, I don't think it's a problem to collect everything and just pull out what you want:
let {stdout ; status ; _ } = Feather.process "ls" |> collect everything in
but it's nice this isn't necessary in the common case.
This interface is pretty nice and simple ! The only thing that bothers me is that I'm not sure I agree that it's not a problem to collect everything if you just want the status, or just the status and either stdout of stderr.
Sometimes you actively don't want to collect stdout of stderr because you want it to be printed on the terminal or wherever it's going (imagine for example a use case where the output of the script is piped into something else, but we want to capture the status (+stderror optionally) the manage errors).
So I still think having at least the possibility to choose precisely what you want is important (it does not have to be particularly simple tho). Ultimately the decision is of course up to you.
Also I think we should remove the
stdout_lines
andstderr_lines
functions and instead expose a "lines" function that just splits on lines. In the end, the call sites would look like:
Agreed !
This would take a GADT under the hood, but above I haven't exposed the variants directly. One nice thing about exposing the variants would be that people could call something 'stdout' without having to worry about shadowing feather's helpers. But I think there's a real value in not using GADT syntax in the interface so people new to OCaml don't get tripped up.
I also completely agree. It's important to keep the interface as simple as possible.
I also vaguely prefer
collect stdout
tocollect Stdout
. An alternative interface could be: [...] This also avoids the collision withBase.stdout
, though I don't mind that terribly.
Since I don't particularly mind the collision too, I'd personally lean towards the first interface rather than this Collect module.
The bugfix for background execution seems awesome! Thanks for all the work in this PR — it'll definitely be a step forward for feather!
Thank you :) Maybe I should separate the status reporting rework (which fixes background bugs) into a separate PR ?
This branch has been rebased on master to include #15
Is it really not supported currently ? It seemed to me redirecting one to the other and then grabbing the other did exactly that. I wonder if adding a second way to do it is a good idea ? Maybe I'm missing something, but to me the natural, Unix way to intersperse both is via redirection.
Good point! Yeah we definitely don't need the stdout_and_stderr function as I described it.
Sometimes you actively don't want to collect stdout of stderr because you want it to be printed on the terminal or wherever it's going (imagine for example a use case where the output of the script is piped into something else, but we want to capture the status (+stderror optionally) the manage errors).
Another solid point! How would you feel about exposing:
val stdout_and_stderr : (string * string) what_to_collect
val stdout_and_status : (string * int) what_to_collect
val stderr_and_status : (string * int) what_to_collect
I think this would solve our use-case and we don't have to introduce another concept in addition to what_to_collect
.
I also completely agree. It's important to keep the interface as simple as possible.
Sounds good!
Since I don't particularly mind the collision too, I'd personally lean towards the first interface rather than this Collect module.
Sweet, let's go with that then!
Thank you :) Maybe I should separate the status reporting rework (which fixes background bugs) into a separate PR ?
You're welcome to split it apart but I am also happy to review it in a chunk: whatever is best for you!
Sorry for spam, I was stupid to put PR's id in my commit name Orz
Hey! I rebased this branch on master and modified the interface and implementation with the what_to_collect
type we discussed, which indeed simplifies things a lot. I also squashed my commits to avoid polluting the history.
Normally it should be ready to merge! The call sites look exactly like you described them.
The only difference for now is that when capturing everything
, we return a string * string * int
tuple instead of a record. I feel it is more consistent with the rest this way, but we could do as you described if you prefer to.
Actually the README has to be changed, I forgot :p I'll do it if I have your approval for the interface itself.
Okay this PR is looking great! Thanks for the help :)
Re everything
: I am a little sad that it's not obvious from the type which string is stdout and which string is stderr. I could imagine someone mixing those up by accident, or feeling like they had to check the documentation to make sure. I don't particularly mind the everything
struct, but if you want to avoid it there's the following pattern:
val everything :
([ `Stdout of string ] * [ `Stderr of string ] * int) what_to_collect
with call site:
let `Stdout stdout, `Stderr stderr, status =
echo "hi" |> collect everything
in
I think I kind of prefer the record? What do you think?
If you're happy with my recent edits, I think this can be merged after we edit the readme and decide on what to do for everything
!
Amazing, thank you!
Too fast for me to comment ;)
Thanks for the edits, it looks good! Originally I came up with the failwith
solution to avoid the small code duplication we now have in collect, but in the end the current implementation is shorter is safer.
Regarding everything
, I must also agree with you in the end: it's safer and more elegant with a record than with variants.
Good work !
Sorry I was excited haha — You too!
This is meant to solve https://github.com/charlesetc/feather/issues/6 and more
So, there is a lot going on here, and there are several questions I have to clarify how we want to finalize everything. But first let me describe what this PR contains a bit.
As expected, there is now a small GADT system to select what we want to capture when running a command. It is simpler than the one described in the issue since I thought it was a better compromise to not over-engineer the code and keep a nice interface.
There are now two running commands (or three if we count
run_bg
):run
andcollect
.run
runs the command as usual, does not capture anything and forgets its exit status. It returns unit.collect
takes a function which is a composition ofstdout
,stderr
,stdout_lines
andstderr_lines
(the composition operator is<+>
since>>
is already taken. It captures only the selected ouputs and returns a record{stdout; stderr; status}
wherestatus
is always present (an integer), andsdout/stderr
fields have the abstract typenot_collected
if not selected, and typestring
orstring list
if selected, depending on if we want lines or not.An example :
To accomplish that, the exit status is now propagated along the evaluation of the AST:
last_exit
has been removed externally and internally. Hence the status is truly lost if we userun
for example.That led me down the rabbit hole of background execution (which was one of the motivation of the original issue). I realized with the help of @tmarti2 that it was fundamentally broken since the addition of the new operators
&&.
,||.
and co. The original behavior for thebackground
flag is to spawn each individual process of the pipeline in its own thread (and then we lost its exit status). However since&&.
and||.
use that exit status to determine if we should run the right-hand side, running those operators in background was completely broken. Worse, since piping two commandsA | B
runsA
in the background, the same problems arise ifA
contains&&.
or||.
To fix all that, this PR implements the following approach:
process
(andfilter_map
) never spawns its own threadrun_bg
, the entireeval
of the AST is done in its own threadA | B
, the entire evaluation ofA
is done in its own thread (rather than only sub processes)This way, even if a command is run in background (or is the left-hand side of a pipe), the exit status is correctly managed. As a consequence,
background
is removed from the context (since it's only used at two locations).I added some tests for all that as well as a bit a code in
example.ml
Some thoughts:
collect
? Clearly it's not the only way to do it.collect_stdout
andcollect_lines
, defined as specialized versions ofcollect
. Do we want to keep them ? (and if so, with the same name ?)<+>
. Is that OK?collect
(and I made a named identity function calledonly_status
). Do we want to keep it like that? Use an optional parameter (but then we have to name it everywhere)?stdout
andstderr
collide withBase.stdout
. Keep it like that or rename them? (and with what names?)