apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.51k stars 1.31k forks source link

Error: disk i/o operation failed #7988

Closed yangly6207 closed 2 years ago

yangly6207 commented 2 years ago

When I running foundationdb in k8s pod, I got an error as followings:

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1661325224.277488" DateTime="2022-08-24T07:13:44Z" Type="Net2Starting" ID="0000000000000000" ThreadID="17196627989356192314" Machine="192.168.75.129:7500" LogGroup="default" />
<Event Severity="10" Time="1661325224.277634" DateTime="2022-08-24T07:13:44Z" Type="Net2TLSConfig" ID="0000000000000000" CAPath="" CertificatePath="" KeyPath="" HasPassword="0" VerifyPeers="Check.Valid=1" ThreadID="17196627989356192314" Machine="192.168.75.129:7500" LogGroup="default" />
<Event Severity="10" Time="1661325224.277634" DateTime="2022-08-24T07:13:44Z" Type="Binding" ID="0000000000000000" PublicAddress="192.168.75.129:7500" ListenAddress="0.0.0.0:7500" ThreadID="17196627989356192314" Machine="192.168.75.129:7500" LogGroup="default" />
<Event Severity="10" Time="1661325224.277634" DateTime="2022-08-24T07:13:44Z" Type="IOSetupError" ID="0000000000000000" UnixErrorCode="b" UnixError="Resource temporarily unavailable" ThreadID="17196627989356192314" Machine="192.168.75.129:7500" LogGroup="default" />
<Event Severity="40" ErrorKind="DiskIssue" Time="1661325224.277634" DateTime="2022-08-24T07:13:44Z" Type="MainError" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" ThreadID="17196627989356192314" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x365961c 0x3658230 0x365862e 0x55379b 0x7f97956aa555" Machine="192.168.75.129:7500" LogGroup="default" />
</Trace>

And the version was 7.1.19.

sfc-gh-etschannen commented 2 years ago

It is likely that something is misconfigured with your setup, but there is not enough information in these logs to diagnose it. Generally this type of message means that FDB is no able to read or write from disk.

xumengpanda commented 2 years ago

what does KVCommit10sSample event say? io_error is also possible when your storage pods are starved from cpu.

yangly6207 commented 2 years ago

The issue was settled by increasing the value of max-aio-nr, beforehand the value aio-nr was close to max-aio-nr.

jzhou77 commented 2 years ago

I'll close this issue, since it has been resolved.