Closed juster closed 1 year ago
I thought it would be nice to have interfaces for input/output streams and specific structs for file/pipe streams. I still have to make another pass to clean things up and also write tests. But the commit above fixed my problem here and the minor issue in #203.
This is interesting work, thanks @juster. Cleaning up the structure of handling file/pipe streams is something I've been hoping to do for a while -- I half suspected this was an issue but have never looked into it. So thanks. I'll look at this more closely (including your commit) in the next couple of weeks.
Out of interest, @juster: what are you using GoAWK for, and how did you stumble across this issue? I'm always interested to learn more about how it's being used.
I use goawk all the time for analyzing server logs. They can be in the GBs. A coworker suggested goawk because it parses CSV. I used to use a lot of Perl or python but I decided to experiment if awk was capable of more or less the same thing. Goawk is also faster than nawk and often even grep!
One weakness with awk is with time stamps. I found this bug because I was trying to parse time stamps with the date command! So I was opening quite a few commands.
Regardless of this bug it was too slow. But then I discovered I could provide custom awk functions written in Go and embed a script in a binary. I love it!
Excellent -- thanks for the details! Glad you were able to use native Go functions to solve the speed problem, too.
When
close()
is called on an input pipe, theclose()
builtin function does not.Wait
for theexec.Cmd
at the other end of the "pipe" and so the resources associated with the child process are not released. This leaves "zombie processes" which continue to be tracked in the parent, and maybe store their return value.close()
will cause them to be forgotten by the interpreter so new processes will be spawned for the same command.Example
You can easily reach reach the limit on the number of open processes, at which point the fork/exec call no longer works. Here, I set the limit to 512 (pretty low) and give it a go.