Closed felipenoris closed 6 years ago
Are you sure that the problem is in filepath.Walk
rather than the NFS volume?
(strace
output for a run of the binary would be helpful to confirm or refute.)
Have you tried the same program with Go 1.9? (Does it exhibit the same problem?)
@bcmills Yes, I was using Go 1.9 and got the bug. I updated to Go 1.10 and the bug persists. I also ran the same program pointing to a local folder (not NFS) with the same files and it worked without bugs.
@felipenoris This program takes a single argument which is the directory to read. Could you run it on your NFS directory and see whether it lists all the files? If it fails to list a file, is it consistent as to which file it fails to list? Thanks.
package main
import (
"fmt"
"os"
"sort"
)
func main() {
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
names, err := f.Readdirnames(-1)
if err != nil {
panic(err)
}
f.Close()
sort.Strings(names)
for _, n := range names {
fmt.Println(n)
}
}
@ianlancetaylor , yes, that causes the same problem and is consistent with the problem that happens with filepath.Walk
function.
I compared with ls -l
running the program below and the problem happens when I query the folder with 83 files.
package main
import (
"fmt"
"os"
"os/exec"
"strings"
)
func main() {
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
names, err := f.Readdirnames(-1)
if err != nil {
panic(err)
}
f.Close()
/*
for _, s := range names {
fmt.Println(s)
}
*/
fmt.Printf("Readdirnames listed %d files.\n", len(names))
cmd := exec.Command("ls", "-l", os.Args[1])
output, err := cmd.Output()
if err != nil {
fmt.Println(err)
return
}
lines := strings.Split(string(output), "\n")
/*
for _, s := range lines {
fmt.Println(s)
}
*/
fmt.Printf("'ls -l' listed %d files.\n", len(lines)-2) // ls outputs 2 additional lines
}
Output
fnoro@dbecceed1666:~/area/fnoro/src/learngo$ ./readdir_vs_ls /area/DB/ARC_ETL
Readdirnames listed 11 files.
'ls -l' listed 11 files.
fnoro@dbecceed1666:~/area/fnoro/src/learngo$ ./readdir_vs_ls /area/DB/ARC_ETL/CSV
Readdirnames listed 1 files.
'ls -l' listed 1 files.
fnoro@dbecceed1666:~/area/fnoro/src/learngo$ ./readdir_vs_ls /area/DB/ARC_ETL/CSV/20180426
Readdirnames listed 82 files.
'ls -l' listed 83 files.
cc @opm22, @accerqueira
Is it always missing the same file? Can you tell us anything about that file, like its ls -l
listing?
@ianlancetaylor , yes, at least for today. I guess yesterday it picked another file to miss.
I made a slight modification to the test program as below, to narrow down the missing file, and to provide the ls -l
listing.
package main
import (
"fmt"
"os"
"os/exec"
"strings"
)
func main() {
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
names, err := f.Readdirnames(-1)
if err != nil {
panic(err)
}
f.Close()
var readdir_swap_names []string
for _, s := range names {
// fmt.Println(s)
if strings.Contains(s, "swap") {
fmt.Println(s)
readdir_swap_names = append(readdir_swap_names, s)
}
}
//fmt.Printf("Readdirnames listed %d files.\n", len(names))
fmt.Printf("Readdirnames listed %d swap names.\n", len(readdir_swap_names))
cmd := exec.Command("ls", "-l", os.Args[1])
output, err := cmd.Output()
if err != nil {
fmt.Println(err)
return
}
lines := strings.Split(string(output), "\n")
var ls_swap_names []string
for _, s := range lines {
// fmt.Println(s)
if strings.Contains(s, "swap") {
fmt.Println(s)
ls_swap_names = append(ls_swap_names, s)
}
}
// fmt.Printf("'ls -l' listed %d files.\n", len(lines)-2)
fmt.Printf("'ls -l' listed %d swap names.\n", len(ls_swap_names))
}
Output:
fnoro@dbecceed1666:/area/fnoro/src/learngo$ ./readdir_vs_ls /area/DB/ARC_ETL/CSV/20180426
20180427_02_UR_TMP_i28_rfc_sap_swaps_cabecalho1.csv.gz
20180427_02_UR_TMP_i28_rfc_sap_swaps_dat_jur_ent.csv.gz
20180427_02_UR_TMP_i28_rfc_sap_swaps_dat_jur_sai.csv.gz
20180427_02_UR_TMP_i28_rfc_sap_swaps_formulas.csv.gz
20180427_02_UR_TMP_i28_sap_swaps_simp1.csv.gz
20180427_02_UR_TMP_i28_rfc_sap_swaps_juros_ent1.csv.gz
Readdirnames listed 6 swap names.
-rwxrwxrwx 1 root root 1304 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_cabecalho1.csv.gz
-rwxrwxrwx 1 root root 1475 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_dat_jur_ent.csv.gz
-rwxrwxrwx 1 root root 1288 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_dat_jur_sai.csv.gz
-rwxrwxrwx 1 root root 141 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_formulas.csv.gz
-rwxrwxrwx 1 root root 1047 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_juros_ent1.csv.gz
-rwxrwxrwx 1 root root 1028 Apr 27 02:20 20180427_02_UR_TMP_i28_rfc_sap_swaps_juros_saida.csv.gz
-rwxrwxrwx 1 root root 20397 Apr 27 02:20 20180427_02_UR_TMP_i28_sap_swaps_simp1.csv.gz
'ls -l' listed 7 swap names.
There you can see that 20180427_02_UR_TMP_i28_rfc_sap_swaps_juros_saida.csv.gz
is missing.
Thanks. Unfortunately I don't see anything informative there.
Can you run your program as strace -f PROG DIR
and attach the output? Thanks. That might help us see what the difference is between what the Go library is doing and what /bin/ls is doing.
Yes, this is the output. Thanks @accerqueira for generating this log!
Thanks. The difference is that the Go code calls the getdents
system call twice, passing 4096
for the length. The C code calls it once, passing 32768
. The larger buffer lets the C code read all the entries in a single call.
Here are the calls from Go:
[pid 3110] getdents64(3, /* 61 entries */, 4096) = 4048
[pid 3110] getdents64(3, /* 23 entries */, 4096) = 1720
[pid 3110] getdents64(3, /* 0 entries */, 4096) = 0
What version of the kernel are you using? This may be a dup of #24015.
@accerqueira, can you provide us information about the kernel version?
Kernel version was 3.10.0-693.11.1.el7.x86_64 (Red Hat Enterprise Linux Server 7.4)
Tried on 4.13.0-39-generic (Ubuntu 17.10) and it seems to work:
./readdir_vs_ls /mnt/area/DB/ARC_ETL/CSV/20180427/
20180428_05_UR_TMP_i28_rfc_sap_swaps_cabecalho1.csv.gz
20180428_05_UR_TMP_i28_rfc_sap_swaps_dat_jur_ent.csv.gz
20180428_05_UR_TMP_i28_rfc_sap_swaps_dat_jur_sai.csv.gz
20180428_05_UR_TMP_i28_rfc_sap_swaps_formulas.csv.gz
20180428_05_UR_TMP_i28_rfc_sap_swaps_juros_saida.csv.gz
20180428_05_UR_TMP_i28_sap_swaps_simp1.csv.gz
20180428_05_UR_TMP_i28_rfc_sap_swaps_juros_ent1.csv.gz
Readdirnames listed 7 swap names.
-rwxrwxrwx 1 root root 1303 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_cabecalho1.csv.gz
-rwxrwxrwx 1 root root 1475 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_dat_jur_ent.csv.gz
-rwxrwxrwx 1 root root 1288 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_dat_jur_sai.csv.gz
-rwxrwxrwx 1 root root 141 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_formulas.csv.gz
-rwxrwxrwx 1 root root 1048 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_juros_ent1.csv.gz
-rwxrwxrwx 1 root root 1028 abr 28 05:44 20180428_05_UR_TMP_i28_rfc_sap_swaps_juros_saida.csv.gz
-rwxrwxrwx 1 root root 19992 abr 28 05:44 20180428_05_UR_TMP_i28_sap_swaps_simp1.csv.gz
'ls -l' listed 7 swap names.
With the following calls from Go:
[pid 19822] getdents64(3, /* 61 entries */, 4096) = 4048
[pid 19822] getdents64(3, /* 24 entries */, 4096) = 1800
[pid 19822] getdents64(3, /* 0 entries */, 4096) = 0
Looks like a getdents64 bug on that kernel version, right? ls uses getdents instead of getdents64?
According to #24015 this is a kernel bug that was fixed in kernel version 3.11. It would show up in your current kernel, even in C, with a sufficiently large directory. I don't think there is anything we can reasonably do to fix this in Go. Closing this as a dup of #24015.
@felipenoris is this really on NFS, or is it actually a CIFS mount? The duplicate bug is for a CIFS specific code path.
@accerqueira , can you answer that?
It's a CIFS mount
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?What did you do?
The function
filepath.Walk
skips exactly one file when iterating over a NFS mapped directory of about 100 files.What did you expect to see?
My directory has exactly 83 files.
What did you see instead?
Only 82 files are listed. The skipped file is random.