Closed bartdeboer closed 6 months ago
Thanks @bartdeboer! A little more context would be helpful for me on this one. Prompt
makes sense, but what effect does the change to First
have that makes it necessary here?
Hi John,
I'm happy to provide more context. The First(n int)
function is designed to read the first n lines from a stream. However, the old implementation had a limitation:
func (p *Pipe) FilterScan(filter func(string, io.Writer)) *Pipe {
return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
for scanner.Scan() {
filter(scanner.Text(), w)
}
return scanner.Err()
})
}
func (p *Pipe) First(n int) *Pipe {
if p.Error() != nil {
return p
}
if n <= 0 {
return NewPipe()
}
i := 0
return p.FilterScan(func(line string, w io.Writer) {
if i >= n {
return
}
fmt.Fprintln(w, line)
i++
})
}
It filters out n number of lines, but keeps scanning until EOF even after n lines have been received. This means that when used to scan for user input we're not sending in EOF, the function doesn't break out of the scanning loop.
My modified version solves this by breaking out of the scanner once the n number of lines is exceeded:
func (p *Pipe) First(n int) *Pipe {
if p.Error() != nil {
return p
}
if n <= 0 {
return NewPipe()
}
return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
i := 0
for scanner.Scan() {
_, err := fmt.Fprintln(w, scanner.Text())
if err != nil {
return err
}
i++
if i >= n {
break
}
}
return scanner.Err()
})
}
It filters out n number of lines, but keeps scanning until EOF even after n lines have been received.
I'm not sure this is true, if I'm understanding you correctly. For example, consider this program:
script.Stdin().First(1).Stdout()
If we run this, and type "hello" followed by a newline, we immediately see the output "hello". First
is not waiting for EOF in order to produce its result. All further input lines are ignored, as you'd expect, and the pipe itself doesn't terminate until the input ends, also as you'd expect.
Can you help me out by showing an example program that doesn't work as you'd expect with the current implementation?
Here's some examples of what I'm seeing:
With .String()
fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input)
With PR:
User input: 1st input\n
Input was 1st input
Without PR:
User input: 1st input\n
2nd input\n
3rd input\n
Ctrl+D
Input was 1st input
With .Stdout()
fmt.Print("User input: ")
bytes, _ := script.Stdin().First(1).Stdout()
fmt.Printf("Input was %d bytes\n", bytes)
With PR:
User input: 1st input\n
1st input
Input was 10 bytes
Without PR:
User input: 1st input\n
1st input
2nd input\n
3rd input\n
Ctrl+D
Input was 10 bytes
I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?
I think maybe it should terminate. It isn't going to send anything more. What happens if I want to read the first 10 lines of a 1GB log file?
Considering it should work the same like head -1
I think it also terminates after n lines?
I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?
This is a very good point. As a matter of fact, no script
filter currently closes the pipe by itself. I agree it seems logical that First
should do that, though, since reading any further lines is guaranteed to be wasted work for all downstream stages. And head
does indeed behave this way.
What do you think about this slightly more compact way of writing the same logic?
return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
for i := 0; i < n && scanner.Scan(); i++ {
_, err := fmt.Fprintln(w, scanner.Text())
if err != nil {
return err
}
}
return scanner.Err()
})
I think your version writes it more elegantly, yes.
Would you like to commit that change to the PR or shall I?
Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?
Yes, go ahead and make the change by all means. Also, shall we document this behaviour in the doc comment? Something like:
// First produces only the first n lines of the pipe's contents, or all the
// lines if there are less than n. If n is zero or negative, there is no output
// at all. When n lines have been produced, First stops reading its input and
// sends EOF to its output.
Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?
Yes, good idea—let's make these changes atomic.
Great! Also, we should probably have a test that shows this behaviour, shouldn't we? What about something like a strings.Reader
that contains several lines, and then after reading it through a pipe with First(1)
we can prove that there are still lines left in the reader.
Something like this?
It checks if there's remaining bytes in the reader after running First (accounting for Scanner's 4096 buffer size).
func TestFirstHasRemainingBytesAfterNLinesOfInput(t *testing.T) {
t.Parallel()
lineSize, testNumLines, totalLines := 1024, 3, 9
var input strings.Builder
// Generate input that is bigger than Scanner's 4096 byte buffer size.
for i := 0; i < totalLines; i++ {
line := strings.Repeat(fmt.Sprintf("%d", i), lineSize-1) + "\n"
input.WriteString(line)
}
inputReader := strings.NewReader(input.String())
got, _ := script.NewPipe().WithReader(inputReader).First(testNumLines).String()
if len(got) == 0 {
t.Errorf("Expected bytes from First, got 0")
}
remInputBytes, err := io.ReadAll(inputReader)
if err != nil {
t.Fatal("Error reading remaining input bytes:", err)
}
if len(remInputBytes) == 0 {
t.Errorf("Expected remaining bytes in the inputReader, got 0")
}
}
Great! I forgot about the buffer. What about this slightly shorter version of the same idea:
func TestFirstDoesNotConsumeUnnecessaryData(t *testing.T) {
t.Parallel()
// First uses a 4096-byte buffer, so will always read at least
// that much, but no more (once N lines have been read).
r := strings.NewReader(strings.Repeat("line\n", 1000))
got, err := script.NewPipe().WithReader(r).First(1).String()
if err != nil {
t.Fatal(err)
}
want := "line\n"
if want != got {
t.Errorf("want output %q, got %q", want, got)
}
if r.Len() == 0 {
t.Errorf("no data left in reader")
}
}
This slightly modifies
First
and introducesPrompt
that can be used for user input:or
Fixes #51, #132