cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

roachtest: error while running `du` #53663

Closed knz closed 3 years ago

knz commented 4 years ago

I see most roachtest report the following error in their logs:

19:04:28 cluster.go:382: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod ssh teamcity-2231078-1598711360-73-n9cpu1 -- /bin/bash -c 'du -c /mnt/data1 > diskusage.txt'
teamcity-2231078-1598711360-73-n9cpu1: /bin/bash -c 'du -c /mnt/da...
   1: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   2: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   3: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   4: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   5: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   6: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   7: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   8: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   9: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
Error: COMMAND_PROBLEM: exit status 1
(1) COMMAND_PROBLEM
Wraps: (2) Node 1. Command with error:
  | ```
  | /bin/bash -c 'du -c /mnt/data1 > diskusage.txt'
  | ```
Wraps: (3) exit status 1
Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

@jlinder can you have a look?

blathers-crl[bot] commented 4 years ago

Hi @knz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

kenliu commented 3 years ago

Moved this back to Triage. Could whoever is on dev-inf support next take a look at this and determine if it was a transient issue? If so please close this out.

rickystewart commented 3 years ago

Can you provide at least one link to a roachtest demonstrating the failure? I can't find one, which of course is interfering with my ability to debug :)

rickystewart commented 3 years ago

Poked around on a roachtest machine and found the following:

ricky@teamcity-2755481-1615231777-01-n4cpu16-0001:~$ ls -lah /mnt/data1
total 164K
drwxrwxrwx 5 root   root   4.0K Mar  8 19:31 .
drwxr-xr-x 3 root   root   4.0K Mar  8 19:30 ..
drwxrwxr-x 4 ubuntu ubuntu 132K Mar  8 19:46 cockroach
drwxrwxrwx 2 root   root   4.0K Mar  8 19:30 cores
drwx------ 2 root   root    16K Mar  8 19:30 lost+found
-rw-r--r-- 1 root   root      0 Mar  8 19:30 .roachprod-initialized
ricky@teamcity-2755481-1615231777-01-n4cpu16-0001:~$ ls -lah /mnt/data1/lost+found/
ls: cannot open directory '/mnt/data1/lost+found/': Permission denied

So that's pretty unambiguous. Presumably we just need to --exclude lost+found when we call du.