golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.92k stars 17.52k forks source link

internal/poll: Reading data piped through os.Stdin hangs on Windows version #22024

Closed jakans closed 6 years ago

jakans commented 6 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.9 darwin/amd64, but cross-compiling for PC/CYGWIN

Does this issue reproduce with the latest release?

Yes, issue is only on go1.9 and Windows, go1.8 is fine on all platforms

What operating system and processor architecture are you using (go env)?

Windows under CYGWIN. (Go compiler is not installed on target machines.)

What did you do?

curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0" | go run pipehangs.go

If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best.

Source code of pipehangs.go is inserted here:

package main

import ( "fmt" "io/ioutil" "os" )

func main() {

str, err := ioutil.ReadAll(os.Stdin)
fmt.Fprintf(os.Stderr, "n: %d\n", len(str))
if err != nil {
    fmt.Fprintf(os.Stderr, "err: %s\n", err)
}

}

File attached, with .txt suffix added to get past data format test:

pipehangs.go.txt

What did you expect to see?

n: 3910

What did you see instead?

Program hangs.

It's fine if you pipe a file:

cat testfile.txt | go run pipehangs.go

or redirect stdin:

go run pipehangs.go < testfile.txt

Only reading a network result under Windows and go1.9 causes it to hang.

ianlancetaylor commented 6 years ago

Probably has something to do with the poller changes, but I don't know what.

CC @alexbrainman

alexbrainman commented 6 years ago

@jakans your program works with Windows type command:

c:\Users\Alex\dev\src\issue\go\22024>type pipehangs.go
package main

import (
        "fmt"
        "io/ioutil"
        "os"
)

func main() {

        str, err := ioutil.ReadAll(os.Stdin)
        fmt.Fprintf(os.Stderr, "n: %d\n", len(str))
        if err != nil {
                fmt.Fprintf(os.Stderr, "err: %s\n", err)
        }
}

c:\Users\Alex\dev\src\issue\go\22024>type pipehangs.go | go run pipehangs.go
n: 215

c:\Users\Alex\dev\src\issue\go\22024>

I do not have curl program installed on my computer to test your scenario. How do I install it? But even after I install curl program, I am not sure how to debug this - it could be curl program that hangs for all we know.

What are you trying to do? If you just want to download some file from the Internet, Go standard library have plenty of tools to do that.

Alex

ghost commented 6 years ago

MSYS (with separately downloaded cURL)

$ curl.exe --version
curl 7.50.2 (i386-pc-win32) libcurl/7.50.2 OpenSSL/1.0.2e zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtsp
smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile NTLM SSL libz

$ go.exe version
go version go1.9 windows/386

$ curl.exe -s \
> "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0" | \
> go.exe run pipehangs.go
n: 0

$ 

MSYS2

http://nurmi-labs.blogspot.com/2016/11/git.html

$ curl.exe --version
curl 7.50.1 (i686-w64-mingw32) libcurl/7.50.1 OpenSSL/1.0.2h zlib/1.2.8 libidn/1
.33 libssh2/1.7.0 nghttp2/1.13.0 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s
 rtmp rtsp scp sftp smtp smtps telnet tftp
Features: IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP HTTP2 Me
talink

$ go.exe version
go version go1.9 windows/386

$ curl.exe -s \
> "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0" | \
> go.exe run pipehangs.go
n: 3910

$
ktye commented 6 years ago

Does it hang also, if you run it from cmd.exe? It could have something to do with the terminal (cygwin, msys/mintty vs console). For me on mintty, pipe sometimes work, sometimes hang. On cmd.exe they always work.

ghost commented 6 years ago

the content of https://github.com/golang/go/issues/22024#issuecomment-332090041 indicates NO outout from curl 7.50.2

in both examples from that comment the Windows Console had been utilised

for what its worth here is MSYS2's mintty.exe running a C Shell

% curl.exe -s \
? "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0" | \
? go.exe run pipehangs.go
n: 3910
%

You don't mention what cURL version you are using, or if you've run your cURL command without the pipe.

alexbrainman commented 6 years ago

@forskning thanks for testing. Looks like "go run pipehangs.go" does not hang in either of your scenarios.

I wonder why curl 7.50.2 does not output any data. What happens if you redirect curl output to a file? Do you get anything written to the file?

Thank you

Alex

ghost commented 6 years ago

@alexbrainman I should have looked into that prior to adding the MSYS content

http://nurmi-labs.blogspot.com/2015/11/bcc55.html

It was Dirk Paehl's cURL 7.50.2 (Download WITH SUPPORT SSL) I used for a Borland/Dmake compile of perl-5.10.1.tar.gz, the CPAN setup, and, as those were http addresses, I never added the SSL support files.

https://github.com/Perl/perl5/commit/378eeda70cc27194f0f718b4c65b8ba147259910#diff-65539b463d4890c68be9c1e3de589c4d

You can read about the events which led up to the "The Borland Chainsaw Massacre" here:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2011-09/msg00034.html

as commented 6 years ago

Unable to reproduce with various versions of curl 7.55.1. n: 0 when curl fails to initialize; but a hang never occurs.

http://www.paehl.com/open_source/?CURL_7.55.1

set URL=https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed^&id=26422376^&version=2.0

curl -s "%URL%" | go run pipehangs.go  & echo curl_755_1
n: 0
curl_755_1

No crash, instant return

curl -s "%URL%" | go run pipehangs.go  & echo curl_755_1_ssh2_ssl_sspi
n: 0
curl_755_1_ssh2_ssl_sspi

image

curl -s "%URL%" | go run pipehangs.go & echo curl_755_1
n: 3910
curl_755_1
curl -s "%URL%" | go run pipehangs.go  & echo curl_755_1_rtmp_ssh2_ssl_sspi
n: 0
curl_755_1_rtmp_ssh2_ssl_sspi

image

go get github.com/as/torgo/hget
go run %GOPATH%\src\github.com\as\torgo\hget\hget.go "%URL%" | go run pipehangs.go
n: 3910
ghost commented 6 years ago

@jakans

Does your curl command when run without the pipe output the xml to the console?

alexbrainman commented 6 years ago

@alexbrainman I should have looked into that prior to adding the MSYS content ...

Simply speaking you curl.exe was broken. Cool. Thank you for explaining.

Unable to reproduce with various versions of curl 7.55.1. n: 0 when curl fails to initialize; but a hang never occurs.

Thank you @as for trying.

Alex

ghost commented 6 years ago

moot point

curl_750_2_ssl (without adding the SSL support files) returns output for http not https

$ curl.exe -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0"
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://eutils.ncbi.nlm.nih.gov/entrez/eutils
/esummary.fcgi?db=pubmed&amp;id=26422376&amp;version=2.0">here</a>.</p>
</body></html>

$ curl.exe -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=26422376&version=2.0"

$ curl.exe --version
curl 7.50.2 (i386-pc-win32) libcurl/7.50.2 OpenSSL/1.0.2e zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtsp
smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile NTLM SSL libz

$

"That they be not forced to sue the lawe, wrapped with so infinite crickes and moot poyntes."

Laurence Humphrey - 1563

jakans commented 6 years ago

The problem was discovered by users of Entrez Direct (https://www.ncbi.nlm.nih.gov/books/NBK179288/) who had installed the latest release on their PCs. While most of EDirect is implemented as a Perl wrapper to NCBI's URL-based Entrez Utilities (https://www.ncbi.nlm.nih.gov/books/NBK25501/), the xtract component is written in Go, and it failed to read pipes from the upstream Perl steps. I quickly reverted to the Go 1.8 xtract binary for CYGWIN (Mac and Linux versions are fine under Go 1.9), and then isolated the problem and created the minimal pipehangs.go program for debugging.

Entrez Direct provides a simple way for non-programmers to do sophisticated data mining in PubMed and other interconnected NCBI databases. The target audience includes scientists, medical librarians, bibliometric researchers, and academic administrators, as well as computational biologists who are looking for an easy way to get specific information from our databases. Users supply query details in command-line arguments. Separate steps in the query are connected by Unix pipes. Everything else is handled behind the scenes, with no additional coding required.

The sample ad hoc EDirect query shown below searches for journal articles in PubMed and retrieves the records in a defined XML form. The XML data set is piped to xtract, which limits results to journals published in the U.S. and then visits each author, printing one author name per line. This output is piped to a standard script that produces a frequency table of publications per author:

esearch -db pubmed -query "rattlesnake phospholipase" | efetch -format xml | xtract -pattern PubmedArticle \ -if MedlineJournalInfo/Country -equals "United States" \ -block Author -tab "\n" -sep " " -element LastName,Initials | sort-uniq-count-rank

9 Wells MA 8 Marangoni S 8 Toyama MH 6 Dennis EA 5 Bon C 5 Kézdy FJ 5 Sigler PB 5 Soares AM 4 Carlini CR 4 Faure G 4 Francischetti IM 4 Guimarães JA 4 HANAHAN DJ 4 Heinrikson RL ...

To answer the earlier question, curl.exe --version produces: curl 7.43.0 (x86_64-unknown-cygwin) libcurl/7.43.0 OpenSSL/1.0.2d zlib/1.2.8 libidn/1.29 libssh2/1.5.0

jakans commented 6 years ago

The latest xtract.go source code now has an undocumented -echo command to help with debugging. xtract -sample will send a small sample XML file to stdout. xtract -echo will read from stdin, using the same method that is hanging when reading from a pipe. To obtain the source code without doing the full EDirect installation you can run:

ftp ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.tar.gz gunzip -c edirect.tar.gz | tar xf - rm edirect.tar.gz

To build the PC binary using a Go 1.9 compiler on any platform, run:

cd edirect env GOOS=windows GOARCH=386 go build -o testxtract -v xtract.go chmod +x testxtract

On a PC under CYGWIN, the following will print a sample XML record:

./testxtract -sample

Redirecting stdout will save it to a file:

./testxtract -sample > sample.xml

Piping it to another instance of testxtract and reading with the -echo command will fail:

./testxtract -sample | ./testxtract -echo

However, redirecting stdin from an existing file will work:

./testxtract -echo < sample.xml

The actual code being tested is excerpted here:

in := os.Stdin

// test reading from input pipe or file (undocumented)
if args[0] == "-echo" {
    const XMLBUFSIZE = 65536 + 16384
    buffr := make([]byte, XMLBUFSIZE)
    for {
        n, err := in.Read(buffr)
        if n == 0 {
            break
        }
        if err != nil {
            fmt.Fprintf(os.Stderr, "err: %s\n", err)
            break
        }
        fmt.Fprintf(os.Stdout, "%s", buffr[:n])
    }
    return
}
as commented 6 years ago

It would be nice to see the dataflow through the pipe. Can you run your pipeline through pv? If you're using Windows you can use the implementation here:

go get github.com/as/torgo/pv
hget http://badurl.com | %GOBIN%\pv > nul
pv: t=1   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=2   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=3   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=4   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=5   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=6   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=7   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=8   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=9   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=10   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=11   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
2017/09/27 14:00:15 Get http://badurl.com: dial tcp: lookup badurl.com: no such host
pv: t=11   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s

./testxtract -sample | pv | ./testxtract -echo

alexbrainman commented 6 years ago

To obtain the source code without doing the full EDirect installation you can run:

ftp ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.tar.gz gunzip -c edirect.tar.gz | tar xf - rm edirect.tar.gz

I did that.

To build the PC binary using a Go 1.9 compiler on any platform, run:

cd edirect env GOOS=windows GOARCH=386 go build -o testxtract -v xtract.go chmod +x testxtract

I did that.

On a PC under CYGWIN,

I don't have CYGWIN, I used cmd.exe program.

the following will print a sample XML record:

./testxtract -sample

I did that. It worked.

Redirecting stdout will save it to a file:

./testxtract -sample > sample.xml

I did that. It worked.

Piping it to another instance of testxtract and reading with the -echo command will fail:

./testxtract -sample | ./testxtract -echo

I did that. It worked.

However, redirecting stdin from an existing file will work:

./testxtract -echo < sample.xml

I did that. It worked.

The actual code being tested is excerpted here:

I think your code has a bug. When in.Read(buffr) returns with n == 0 you break out of the read loop. But there could be more data coming. The Reader documentation says https://golang.org/pkg/io/#Reader about that: "... Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF. ...".

Perhaps your CYGWIN shell program (unlike my cmd.exe) inserts 0 length writes into the pipe between 2 programs.

Alex

ghost commented 6 years ago

In the event anyone decides to pursue testing the Cygnus port.

A search of the current curl and perl implementations follows.

https://cygwin.com/cgi-bin2/package-grep.cgi

curl-7.55.1-1 perl-5.24.1-1

Pour faire l'Archiviste il faut être un homme intelligent.

Un homme intelligent ne fait pas l'Archiviste.

jakans commented 6 years ago

I've modified the -echo code to report if "n == 0" without breaking out of the loop, but the test had been added merely to suppress the EOF error message. The actual function that calls Read only looks for "err != nil" to determine end of data.

To further investigate, I added a -read command to specifically test the actual read block function. These are in the xtract.go that is now on the ftp site. The relevant code section is:

// DEBUGGING

// test reading from input pipe or file (undocumented)
if args[0] == "-echo" {
    const XMLBUFSIZE = 65536 + 16384
    buffr := make([]byte, XMLBUFSIZE)
    for {
        n, err := in.Read(buffr)
        if err != nil {
            fmt.Fprintf(os.Stderr, "ERR: %s, N: %d\n", err, n)
            break
        }
        if n == 0 {
            fmt.Fprintf(os.Stderr, "N: zero\n")
            continue
        }
        fmt.Fprintf(os.Stdout, "%s", buffr[:n])
    }
    return
}

// CREATE XML BLOCK READER FROM STDIN OR FILE

rdr := NewXMLReader(in, doCompress, doCleanup, doStrict || doMixed)
if rdr == nil {
    fmt.Fprintf(os.Stderr, "\nERROR: Unable to create XML Block Reader\n")
    os.Exit(1)
}

// DEBUGGING

// test reading blocks from xml reader (undocumented)
if args[0] == "-read" {
    for {
        str := rdr.NextBlock()
        if str == "" {
            fmt.Fprintf(os.Stderr, "\n\nSTR: empty\n")
            break
        }
        fmt.Fprintf(os.Stdout, "%s", str)
    }
    fmt.Fprintf(os.Stdout, "\n")
    return
}

I asked my colleague with a PC to try:

testxtract -sample | testxtract -echo

and:

testxtract -sample | testxtract -read

and he got wildly inconsistent results. Sometimes it would hang, sometimes it wouldn't. Inserting pv in between would either hang, print one "pv:" line and then the expected output, or print "pv:" lines forever.

My expert-of-last-resort colleague got involved, and quickly found that he could force consistent hangs by replacing:

testxtract -sample

in the above commands with:

(sleep 1; testxtract -sample)

That was immediately confirmed by the other colleague.

Do you know of any change between Go 1.8 and Go 1.9 that might explain this sort of timing dependency with pipes on the PC?

as commented 6 years ago
        n, err := in.Read(buffr)
        if err != nil {

Err can be io.EOF, and n > 0, but we break out of the loop so that data is never processed

            fmt.Fprintf(os.Stderr, "ERR: %s, N: %d\n", err, n)
            break
        }
        if n == 0 {
            fmt.Fprintf(os.Stderr, "N: zero\n")
            continue
        }
        fmt.Fprintf(os.Stdout, "%s", buffr[:n])
as commented 6 years ago

Just because err != nil doesn't mean you're done processing the data, it may be the case that n > 0. The test err == io.EOF tells you that the reader is done, so err != nil it says nothing about the data it copied to the slice for you.

for {
   n, err := fd.Read(p)
   if err != nil && err != io.EOF{
       // hard stop; a real error happens
       break
   }
   datain <- append([]byte{}, p[:n]...) // edited here 
   if err == io.EOF{
     break
  }
}

Above is a cautious way to do this, i.e., don't process the bytes if a real error occurred. The go doc io.Reader recommends p[:n] is handled regardless of what err is though.

jakans commented 6 years ago

I had noticed this in the documentation, but in my early tests err != nil always came with n == 0. To be robust against future implementation changes, I've modified the code, but I still prefer to err on the side of caution and ignore bytes that come with a real error. It will now print any non-EOF error message to Stderr, at least making the user aware that something is amiss. The -echo test code was also changed, with the relevant section shown below:

        n, err := in.Read(buffr)
        if err != nil {
            if err != io.EOF {
                fmt.Fprintf(os.Stderr, "ERR: %s, N: %d\n", err, n)
                break
            }
            if n == 0 {
                // EOF and no more data
                break
            }
        }
        if n == 0 {
            fmt.Fprintf(os.Stderr, "N: zero\n")
            continue
        }
        fmt.Fprintf(os.Stdout, "%s", buffr[:n])

The source code on the ftp site has been updated, but I'll allow some time for in-house testing before making a new release of the precompiled binaries.

Thanks for the advice.

as commented 6 years ago

Not a problem, but just FYI my example and yours are semantically similar - they don't handle data if there's a real error. This is because p[:n], when n==0 is a valid zero-length slice. The disadvantage is that it doesn't guard against non-conforming implementations of io.Reader that return -1.

Below I used n >= 0 and not n > 0 because sometimes a zero-length read carries information through the virtue of the read call itself, which I believe @alexbrainman mentioned somewhere earlier in the thread. It's up to you whether you need this information or not.

// Edit: as pointed out below, this is still incorrect as the data should 
// be processed regardless of error's value
for {
   n, err := fd.Read(p)
   if err != nil && err != io.EOF{
       // hard stop; a real error happens
       break
    }
   if n >= 0{
         // handle data in p[:n]
   }
   if err == io.EOF{
     break
  }
}
alexbrainman commented 6 years ago

@jakans and @as, I think, you are still wrong with your code. As per io.Reader documentation: "... When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. ...". So if Read returns n > 0, you must process p[:n] bytes regardless what the err is. And you don't, because you break out of for loop.

@jakans are you still having a problem? If yes, I take it you can reproduce it with a Go program (without curl). Is that correct? If so, can you, show us that program, how you run it, what the program outputs, and why you think the output is wrong. If we need CYGWIN to reproduce, tell us where to get it and how to install it. Thank you.

Alex

as commented 6 years ago

I agree it is incorrect, but a mild amount of APIs outside of stdlib return mangled data on a non-EOF error. I would love for this to work everywhere, it a lot easier on the eyes too.

for {
    n, err = r.Read(b)
    if n > 0 {
        // process b[:n]
    }
    if err != nil {
        break
    }
}
ghost commented 6 years ago

presuming from https://github.com/golang/go/issues/22024#issuecomment-332646802

the hardware utilised is 64bit

https://cygwin.com/install.html

Installing and Updating Cygwin for 64-bit versions of Windows

jakans commented 6 years ago

That is the CYGWIN address I was about to pass along.

The current xtract.go source code has the modified -echo command that can handle n == 0 blocks. You can download it by running:

ftp ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.tar.gz gunzip -c edirect.tar.gz | tar xf - rm edirect.tar.gz

and build the PC executable with:

env GOOS=windows GOARCH=386 go build -o testxtract -v xtract.go chmod +x testxtract

You should be able to reproduce the hanging problem when built under Go 1.9 by running:

(sleep 1; testxtract -sample) | testxtract -echo

and:

(sleep 1; testxtract -sample) | pv | testxtract -echo

For reference, here is the current -echo source code:

if args[0] == "-echo" {
    const XMLBUFSIZE = 65536 + 16384
    buffr := make([]byte, XMLBUFSIZE)
    for {
        n, err := in.Read(buffr)
        if err != nil {
            if err != io.EOF {
                fmt.Fprintf(os.Stderr, "ERR: %s, N: %d\n", err, n)
                break
            }
            if n == 0 {
                // EOF and no more data
                break
            }
        }
        if n == 0 {
            fmt.Fprintf(os.Stderr, "N: zero\n")
            continue
        }
        fmt.Fprintf(os.Stdout, "%s", buffr[:n])
    }
    return
}
ghost commented 6 years ago

@jakans

If CYGWIN is needed to reproduce, beyond the minimal base packages, what added packages would be relevant?

Not pertinent to this thread, but IIRC historically bi-directional named pipes have been problematic on CYGWIN.

jakans commented 6 years ago

I'm told that the base system should be sufficient.

as commented 6 years ago

I was able to reproduce this with Cygwin 64bit and the program xtract.go.

At the simplest level, this is the easiest way to reproduce the issue.

The first step is to Install Cygwin, the default installation will work. You need to open the cygwin terminal to reproduce the issue.

(invalid; invalid) | pv

The behavior is easier to explain with the command below. In the parens the LHS command sleep ignores stdout entirely. No output from pv, and pv never terminates.

$ (sleep 0;cat xml) | pv
pv: t=1   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=2   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
pv: t=3   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s
^C

If we change the order around to (cat xml ; sleep 0) | pv, we see output, but the Go program never terminates. Output, but no termination.

$ (cat xml; sleep 0) | pv
<PubmedArticle>
... The full output was trimmed but it's all there
</PubmedArticle>
pv: t=1   b=6.246 K    Δ=6.246 K/s  a=3.123 K/s
pv: t=2   b=6.246 K    Δ=0.000 B/s  a=2.082 K/s
pv: t=3   b=6.246 K    Δ=0.000 B/s  a=1.562 K/s
pv: t=4   b=6.246 K    Δ=0.000 B/s  a=1.249 K/s
^C

It doesn't matter if the second command is valid, or how many commands exist in the list. Only the first command in the parenthetical list emits output to pv. Again, pv never terminates.

$ (cat xml; cat xml; cat xml; cat xml; cat xml; cat xml)  | pv
<PubmedArticle>
... The full output was trimmed but it's all there
</PubmedArticle>
pv: t=1   b=6.246 K    Δ=6.246 K/s  a=3.123 K/s
pv: t=2   b=6.246 K    Δ=0.000 B/s  a=2.082 K/s
pv: t=3   b=6.246 K    Δ=0.000 B/s  a=1.562 K/s
pv: t=4   b=6.246 K    Δ=0.000 B/s  a=1.249 K/s
^C

The only condition where pv terminates is when there is one command in the list

(cat<xml) | pv
... output trimmed
pv: t=0   b=6.246 K   Δ=6.246 K/s  a=6.246 K/s
as@DESKTOP; $
as commented 6 years ago

Update: My prior post said the bug occurred in 1.8 was well, this was actually not the case. go1.8 is bug free.

ghost commented 6 years ago

A comparison of Microsoft® subsystems and the Cygnus port (a POSIX emulation layer) here.

http://wiki.tcl.tk/11329

Speculation but if Go was ported to SUA (deprecated in Win8.0) perhaps the bug wouldn't be present utilising those PE32 or PE32+ (PE64) executables. Porting code to SUA has however sometimes been problematic, with its partially implemented poll, and generally one compiles some GNU utilities for that subsystem, bison, flex, gm4, gmake, and gsed; the last release furthermore came with gcc-4.2.

http://nurmi-labs.blogspot.com/2017/09/bootstrapping-go.html

I realise there's another currently Open thread on this tracker about WSL which was submitted on Sep 25, 2017.

as commented 6 years ago

I wrote some automation to bisect this issue and ran it unattended for a few hours.

https://github.com/as/goissues/tree/master/22024

Here's the first bad commit:

commit c05b06a12d005f50e4776095a60d6bd9c2c91fac Author: Ian Lance Taylor iant@golang.org Date: Fri Feb 10 15:17:38 2017 -0800

os: use poller for file I/O

This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
alexbrainman commented 6 years ago

I was able to reproduce this with Cygwin 64bit and the program xtract.go.

At the simplest level, this is the easiest way to reproduce the issue.

The first step is to Install Cygwin, the default installation will work. You need to open the cygwin terminal to reproduce the issue.

(invalid; invalid) | pv

invalid is an invalid command, such as the string invalid pv can be replaced with any go program reading from os.Stdin Using go1.9 and 386, amd64 all combinations trigger the fault on my machine. Using go1.7 and go1.8, the fault could not be reproduced

@as

I installed Cygwin 32bit (I do not have 64bit OS). I have built github.com/as/goissues/22024/pv.go with "go version devel +438c8f6 Wed Sep 27 01:10:05 2017 +0000 linux/amd64". I get this output:

a@a-PC /cygdrive/e
$ (invalid;invalid) | ./pv.exe
-bash: invalid: command not found
-bash: invalid: command not found
pv: t=0   b=0.000 B   Δ=0.000 B/s  a=0.000 B/s

a@a-PC /cygdrive/e
$

Is that what you expect your program to show / do?

Alex

as commented 6 years ago

@alexbrainman

Not on a version that has the issue. Your session's output indicates no reproduction. On the system I used, which is a 64-bit W10 and 64 bit cygwin, the pipe never closes, so pv runs forever printing the I/O throughput of stdout every second (to stderr).

Basically, If you have to ^C to get your prompt back, you hit the issue. Through experimentation the program didn't matter but the version of Go used to build it did.

I used pv because initially, I wanted to see what was going on with the pipeline between xtract.exe and curl. I switched computers in the process of debugging this only to realize that suddenly I was getting different results.

One computer had a version of pv built in 2015, the other one was built with release go1.9 the same day (same source file pv.go). From that I was able to conclude that xtract.go file was not necessary to reproduce the issue, and the Go version mattered.

The question now is why your system isn't hitting the issue. Could it be the way Cygwin behaves across architectures or versions of windows?

-as

alexbrainman commented 6 years ago

The question now is why your system isn't hitting the issue.

If you can reproduce the problem, why don't you debug this yourself? Try and understand what pv.exe is doing when it "hung". Is it running 100% CPU (so it would be looping around somewhere in the code)? If not, then it would be sitting in a some syscall. Which syscall? Maybe you could use delve debugger to find out. You could start pv.exe under debugger or you could attach delve to the running pv.exe process. If that does not work, maybe you could call panic function from your pv.exe after some timeout. If you set GOTRACEBACK=2 before running pv.exe, it should display quite a bit of stacktrace.

Alex

as commented 6 years ago

It's not spinning on the CPU, just patiently waiting for a ReadFile to return. No usermode activity except for the goroutine printing to stderr.

as@DESKTOP /cygdrive/c/g/src/github.com/as/goissues/22024
$ (invalid;invalid)|./pvx.exe  2>&1 >log
sh: invalid: command not found
sh: invalid: command not found
read
pv: t=1   b=0.000 B   ”=0.000 B/s  a=0.000 B/s
pv: t=2   b=0.000 B   ”=0.000 B/s  a=0.000 B/s
pv: t=3   b=0.000 B   ”=0.000 B/s  a=0.000 B/s
pv: t=4   b=0.000 B   ”=0.000 B/s  a=0.000 B/s
panic: exits

goroutine 20 [running]:
panic(0x4bb1e0, 0x4f0b70)
        C:/Go/src/runtime/panic.go:540 +0x46c fp=0xc04218bfc0 sp=0xc04218bf18 pc=0x42994c
main.main.func2()
        C:/g/src/github.com/as/goissues/22024/pv.go:72 +0x53 fp=0xc04218bfe0 sp=0xc04218bfc0 pc=0x4a9ad3
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc04218bfe8 sp=0xc04218bfe0 pc=0x4543a1
created by main.main
        C:/g/src/github.com/as/goissues/22024/pv.go:70 +0x2c5

goroutine 2 [force gc (idle)]:
runtime.gopark(0x4e5a98, 0x55b910, 0x4deb1b, 0xf, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:277 +0x141 fp=0xc042025f68 sp=0xc042025f38 pc=0x42ba11
runtime.goparkunlock(0x55b910, 0x4deb1b, 0xf, 0xc04201e014, 0x1)
        C:/Go/src/runtime/proc.go:283 +0x65 fp=0xc042025fa8 sp=0xc042025f68 pc=0x42bb05
runtime.forcegchelper()
        C:/Go/src/runtime/proc.go:235 +0xda fp=0xc042025fe0 sp=0xc042025fa8 pc=0x42b81a
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042025fe8 sp=0xc042025fe0 pc=0x4543a1
created by runtime.init.4
        C:/Go/src/runtime/proc.go:224 +0x3c

goroutine 3 [GC sweep wait]:
runtime.gopark(0x4e5a98, 0x55ba00, 0x4de49d, 0xd, 0x41d814, 0x1)
        C:/Go/src/runtime/proc.go:277 +0x141 fp=0xc042027f60 sp=0xc042027f30 pc=0x42ba11
runtime.goparkunlock(0x55ba00, 0x4de49d, 0xd, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:283 +0x65 fp=0xc042027fa0 sp=0xc042027f60 pc=0x42bb05
runtime.bgsweep(0xc042016070)
        C:/Go/src/runtime/mgcsweep.go:52 +0xb1 fp=0xc042027fd8 sp=0xc042027fa0 pc=0x41d841
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042027fe0 sp=0xc042027fd8 pc=0x4543a1
created by runtime.gcenable
        C:/Go/src/runtime/mgc.go:216 +0x5f

goroutine 17 [finalizer wait]:
runtime.gopark(0x4e5a98, 0x577c98, 0x4de7fd, 0xe, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:277 +0x141 fp=0xc042021f00 sp=0xc042021ed0 pc=0x42ba11
runtime.goparkunlock(0x577c98, 0x4de7fd, 0xe, 0x14, 0x1)
        C:/Go/src/runtime/proc.go:283 +0x65 fp=0xc042021f40 sp=0xc042021f00 pc=0x42bb05
runtime.runfinq()
        C:/Go/src/runtime/mfinal.go:175 +0xca fp=0xc042021fe0 sp=0xc042021f40 pc=0x41463a
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042021fe8 sp=0xc042021fe0 pc=0x4543a1
created by runtime.createfing
        C:/Go/src/runtime/mfinal.go:156 +0x69

goroutine 18 [syscall]:
runtime.notetsleepg(0x55bb80, 0x3b91f63c, 0x0)
        C:/Go/src/runtime/lock_sema.go:280 +0x59 fp=0xc042023f60 sp=0xc042023f20 pc=0x40e3f9
runtime.timerproc()
        C:/Go/src/runtime/time.go:216 +0x313 fp=0xc042023fe0 sp=0xc042023f60 pc=0x446063
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042023fe8 sp=0xc042023fe0 pc=0x4543a1
created by runtime.addtimerLocked
        C:/Go/src/runtime/time.go:122 +0xf4

goroutine 19 [syscall, locked to thread]:
runtime.cgocall(0x455240, 0xc042028a40, 0x46c2a2)
        C:/Go/src/runtime/cgocall.go:132 +0xea fp=0xc042189c10 sp=0xc042189bd0 pc=0x40251a
syscall.Syscall6(0x7ffc5b094690, 0x5, 0x3d0, 0xc04207e000, 0x100000, 0xc042189d14, 0x0, 0x0, 0xc04203c480, 0xc042034080, ...)
        C:/Go/src/runtime/syscall_windows.go:174 +0x69 fp=0xc042189c40 sp=0xc042189c10 pc=0x445479
syscall.ReadFile(0x3d0, 0xc04207e000, 0x100000, 0x100000, 0xc042189d14, 0x0, 0xc042189d28, 0xc042189d30)
        C:/Go/src/syscall/zsyscall_windows.go:311 +0xe0 fp=0xc042189cd0 sp=0xc042189c40 pc=0x46d240
syscall.Read(0x3d0, 0xc04207e000, 0x100000, 0x100000, 0x0, 0x0, 0x0)
        C:/Go/src/syscall/syscall_windows.go:297 +0x6f fp=0xc042189d28 sp=0xc042189cd0 pc=0x46ccef
internal/poll.(*FD).Read(0xc042076000, 0xc04207e000, 0x100000, 0x100000, 0x0, 0x0, 0x0)
        C:/Go/src/internal/poll/fd_windows.go:431 +0x1c2 fp=0xc042189d88 sp=0xc042189d28 pc=0x484f82
os.(*File).read(0xc042074000, 0xc04207e000, 0x100000, 0x100000, 0x7ffc5b094780, 0x1, 0xc042189e58)
        C:/Go/src/os/file_windows.go:207 +0x55 fp=0xc042189dd0 sp=0xc042189d88 pc=0x4878c5
os.(*File).Read(0xc042074000, 0xc04207e000, 0x100000, 0x100000, 0x2, 0x4dcf6b, 0x1)
        C:/Go/src/os/file.go:103 +0x74 fp=0xc042189e48 sp=0xc042189dd0 pc=0x486e74
bufio.(*Reader).Read(0xc042180000, 0xc04207e000, 0x100000, 0x100000, 0x1, 0x1, 0xc042189f30)
        C:/Go/src/bufio/bufio.go:199 +0x1aa fp=0xc042189ee0 sp=0xc042189e48 pc=0x4640aa
io.(*teeReader).Read(0xc04204a3a0, 0xc04207e000, 0x100000, 0x100000, 0x0, 0x0, 0x0)
        C:/Go/src/io/io.go:525 +0x5c fp=0xc042189f40 sp=0xc042189ee0 pc=0x45cc2c
main.main.func1(0x54a2c0, 0xc04204a3a0, 0xc04207e000, 0xc04217e000, 0xc04203e060)
        C:/g/src/github.com/as/goissues/22024/pv.go:58 +0x88 fp=0xc042189fb8 sp=0xc042189f40 pc=0x4a9948
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042189fc0 sp=0xc042189fb8 pc=0x4543a1
created by main.main
        C:/g/src/github.com/as/goissues/22024/pv.go:53 +0x2ad

A simple program gives the same result.

package main
import (
    "os"
    "io"
)
func main() {
    io.Copy(os.Stdout, os.Stdin)
}
as commented 6 years ago

More importantly, I found a way to reproduce this without a (cmd|cmd)|. With go1.9 and cygwin, if the go1.9 program is reading from the pipe, the read size must be less than or equal to the write size on the writer's side. Otherwise, the go1.9 program read data for a little bit and then blocks on syscall.ReadFile in the same manner as the stack trace above shows.

// rcount.go - This program just counts the # of reads that happen
package main

import (
        "os"
        "log"
        "flag"
)
var(
        bs = flag.Int("bs", 1, "block size")
)
func init(){
        flag.Parse()
}
func main() {
        var(
                nr, n int
                err error
                b = make([]byte, *bs)
        )
        for {
                n, err = os.Stdin.Read(b)
                nr++
                log.Printf("read #%d: n=%d (err=%s)\n", nr, n, err)
                if err != nil{
                        break
                }
        }
        log.Printf("done")
}

go build rcount.go

Reads 1 byte and exits (bs=1 dd count=1)

as@DESKTOP-34JC31L /cygdrive/c/g/src/github.com/as/goissues/22024
$ dd if=/dev/urandom bs=1 count=1 | ./rcount -bs 1
1+0 records in
1+0 records out
1 byte copied, 0.00277077 s, 0.4 kB/s
2017/10/02 06:22:42 read #1: n=1 (err=%!s(<nil>))
2017/10/02 06:22:42 read #2: n=0 (err=EOF)
2017/10/02 06:22:42 done

Reads forever (bs=1)

as@DESKTOP-34JC31L /cygdrive/c/g/src/github.com/as/goissues/22024
$ dd if=/dev/urandom bs=1 | ./rcount -bs 1
output not shown

Reads forever as well (dd bs=1024)

as@DESKTOP-34JC31L /cygdrive/c/g/src/github.com/as/goissues/22024
$ dd if=/dev/urandom bs=1024 | ./rcount -bs 1
output not shown

Reads forever still (bs=1024)

dd if=/dev/urandom bs=1024 | ./rcount -bs 1024
output not shown

When reader tries to read more than writer writes, reader hangs. Counts ~2000 reads, hangs in syscall (dd bs=1 rcount bs=2)


as@DESKTOP-34JC31L /cygdrive/c/g/src/github.com/as/goissues/22024
$ dd if=/dev/urandom bs=1 | ./rcount -bs 2
2017/10/02 06:30:35 read #1: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #2: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #3: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #4: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #5: n=2 (err=%!s(<nil>))
.. youll just have to trust me when i say theres ~2000 more of these
2017/10/02 06:30:35 read #2030: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #2031: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #2032: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #2033: n=2 (err=%!s(<nil>))
2017/10/02 06:30:35 read #2034: n=2 (err=%!s(<nil>))
... it hangs here, and then I ^C
69606+0 records in
69605+0 records out
69605 bytes (70 kB, 68 KiB) copied, 0.86377 s, 80.6 kB/s

Tested with a few more values like bs=64, bs=512, etc. Result is the same, reader can't try to read more than the writer writes or reader blocks. This doesn't happen in standard cmd.exe.

alexbrainman commented 6 years ago

When reader tries to read more than writer writes, reader hangs.

Perhaps something like that https://blogs.msdn.microsoft.com/oldnewthing/20110707-00/?p=10223 is happening. Can you try and split os.Stdin reading and os.Stdout writing into separate threads? Do not use unbuffered channel, because it will block things again.

Thank you

Alex

as commented 6 years ago

I'll look into that next time I'm at a Windows box.

as commented 6 years ago

Splitting the writer and reader to work on different threads did not change the behavior. In fact, deleting the writer completely did not squash the bug. However, I do think the pipe is getting clogged, just that it might not be the runtime's fault.

My impression is that Cygwin is using some pump with a synchronous write/read loop. I also did some digging and found out that Cygwin creates some named pipes before initializing the child process. It must use the sigwait pipes to synchronize. I'm not sure how Go feels about this.

Pipe Name                                    Instances       Max Instances
---------                                    ---------       -------------
cygwin-c5e39b7a9d22bafb-1776-sigwait              1                1
cygwin-c5e39b7a9d22bafb-pty0-from-master          1                1
cygwin-c5e39b7a9d22bafb-pty0-to-master            1                1
cygwin-c5e39b7a9d22bafb-pty0-to-master-cyg          1                1
cygwin-c5e39b7a9d22bafb-pty0-echoloop             1                1
cygwin-c5e39b7a9d22bafb-pty0-master-ctl           1                1
cygwin-c5e39b7a9d22bafb-8168-sigwait              1                1
cygwin-c5e39b7a9d22bafb-7356-sigwait              1                1
cygwin-c5e39b7a9d22bafb-pty1-from-master          1                1
cygwin-c5e39b7a9d22bafb-pty1-to-master            1                1
cygwin-c5e39b7a9d22bafb-pty1-to-master-cyg          1                1
cygwin-c5e39b7a9d22bafb-pty1-echoloop             1                1
cygwin-c5e39b7a9d22bafb-pty1-master-ctl           1                1
cygwin-c5e39b7a9d22bafb-7140-sigwait              1                1
cygwin-c5e39b7a9d22bafb-4316-sigwait              1                1
   45     975 [main] dd 1996 fhandler_pipe::create: name \\.\pipe\cygwin-c5e39b7a9d22bafb-1996-sigwait, size 5412, mode PIPE_TYPE_MESSAGE
   69    1044 [main] dd 1996 fhandler_pipe::create: pipe read handle 0x110
   23    1067 [main] dd 1996 fhandler_pipe::create: CreateFile: name \\.\pipe\cygwin-c5e39b7a9d22bafb-1996-sigwait
   59    1126 [main] dd 1996 fhandler_pipe::create: pipe write handle 0x114
  152   11470 [main] dd 1996 pinfo_init: Set nice to 0
   79   11549 [main] dd 1996 pinfo_init: pid 1996, pgid 1996, process_state 0x41
   73   11622 [main] dd 1996 App version:  2007.0, api: 0.306
   39   11661 [main] dd 1996 DLL version:  2009.0, api: 0.318
   35   11696 [main] dd 1996 DLL build:    2017-09-12 10:41
   47   11743 [main] dd 1996 dtable::extend: size 32, fds 0x6129EB84
  458   12201 [main] dd 1996 __get_lcid_from_locale: LCID=0x0409
  994   13195 [main] dd 1996 transport_layer_pipes::connect: Try to connect to named pipe: \\.\pipe\cygwin-c5e39b7a9d22bafb-lpc
  131   13326 [main] dd 1996 transport_layer_pipes::connect: Error opening the pipe (2)

The above two sessions are from the same command: strace dd bs=1 count=1. Needless to say, the strace+dd output actually clogged the pipe on its own, and I had to ^C out to kill it.

image

And then it died.

I also found this thread on a mailing list that says Cygwin needs a major FIFO rewrite.

https://www.cygwin.com/ml/cygwin/2016-01/msg00085.html

ghost commented 6 years ago

IIRC unlike the the SFU Interix mkfifo, the MSYS(?)/MSYS2 mkfifo creates a .lnk file.

https://sourceforge.net/projects/msys2/files/REPOS/MSYS2/x86_64/ coreutils-8.26-2-x86_64.pkg.tar.xz

I wrote to C.V. at Red Hat Inc. mentioned in the link provided in https://github.com/golang/go/issues/22024#issuecomment-334398619.

alexbrainman commented 6 years ago

Splitting the writer and reader to work on different threads did not change the behavior.

Thanks for trying.

In fact, deleting the writer completely did not squash the bug.

Good to know. Maybe you could just write small program in C that does the same - reads n bytes from stdin using ReadFile Windows API - to see if we could reproduce this problem?

However, I do think the pipe is getting clogged, just that it might not be the runtime's fault.

I agree. I suspect the problem is in one of CYGWIN's programs.

I also did some digging and found out that Cygwin creates some named pipes before initializing the child process.

It is responsibility of a parent process to take care of all file descriptors that child process might inherit. Are we (child process) suppose to close/read/write some files? I don't believe so.

It must use the sigwait pipes to synchronize. I'm not sure how Go feels about this.

There is no such thing as sigwait in Windows. It is something internal to CYGWIN. Does CYGWIN expects internal processes understand these? I don't think so. On Windows all programs speak Windows API.

Needless to say, the strace+dd output actually clogged the pipe on its own, and I had to ^C out to kill it.

So that looks like you replicated the problem without any Go program. Can we report it somewhere?

Alex

as commented 6 years ago

There is no such thing as sigwait in Windows. It is something internal to CYGWIN. Does CYGWIN expects internal processes understand these? I don't think so. On Windows all programs speak Windows API.

I should have been more clear on the wording, I used sigwait because that's what Cygwin named the pipe, not because I think the windows subsystem supports signals.

It is responsibility of a parent process to take care of all file descriptors that child process might inherit. Are we (child process) suppose to close/read/write some files? I don't believe so.

I don't think the process is aware of this, just pointing out that it's more internal machinery that can create faults in the I/O flow.

Good to know. Maybe you could just write small program in C that does the same - reads n bytes from stdin using ReadFile Windows API - to see if we could reproduce this problem?

I'll see what I can do when I'm at the machine again

So that looks like you replicated the problem without any Go program. Can we report it somewhere?

It seems like @forskning previously contacted the maintainer.

/cc @forskning

alexbrainman commented 6 years ago

https://cygwin.com/ml/cygwin/2017-10/

October 05, 2017 10:38 golang issue

Thank you @forskning

The link is https://cygwin.com/ml/cygwin/2017-10/msg00047.html I don't see any replies yet.

Alex

gopherbot commented 6 years ago

Change https://golang.org/cl/69871 mentions this issue: internal/poll: only call SetFileCompletionNotificationModes for sockets

alexbrainman commented 6 years ago

@jakans and @as can you, please, try https://golang.org/cl/69871 to see if it fixes your problem?

Thank you

Alex

as commented 6 years ago

Rebuilt the programs. https://golang.org/cl/69871 fixes the issue on my systems.

alexbrainman commented 6 years ago

Rebuilt the programs. https://golang.org/cl/69871 fixes the issue on my systems.

Thank you. I will wait for @jakans now.

Alex

jakans commented 6 years ago

I tried checking out the latest (1.9.1) source code, but one of the two lines to change was slightly different than the patch. So I used the following:

if fd.pd.pollable() && hasLoadSetFileCompletionNotificationModes {

My colleague with a PC said the resulting binary still stalled, but I don't have much confidence that I was using the right source code.

Since you have a copy of the xtract.go source code, could you compile and test it on your side? That way there is no chance of version skew. Or compile it and make the binary available for me to pass along to my colleague for testing?

as commented 6 years ago

@jakans

I applied your changes to the 1.9.1 source code and the issue still happens with pv. This means that the change you made was not enough to fix the issue.

I built your program again with the version from https://golang.org/cl/69871 and the problem went away.

jakans commented 6 years ago

Thanks for testing it, and of course for all the work to get to this point. Since this fixes the problem, are you and the other developers satisfied with the change, and will be in 1.9.2 and future versions?