gnolang / gno

Gno: An interpreted, stack-based Go virtual machine to build succinct and composable apps + Gno.land: a blockchain for timeless code and fair open-source
https://gno.land/
Other
850 stars 346 forks source link

tm2/pkg/autofile: too frequent and unnecessary file closes and reopens every second even without a write #698

Open odeke-em opened 1 year ago

odeke-em commented 1 year ago

Too frequent and unnecessary file closes and reopens every second even without a write

I had previously reported this problem and mailed a fix to the original Tendermint in November 2021, but the team didn't seem to care for it and closed the PR with zero explanation (many other changes got closed with no explanation) I detailed results of our audit at https://cyber.orijtech.com/audits/evmos-audits-v1#tenderminttendermintinternallibsautofile-opened-too-many-files

Noticed from tharsis/evmosd in 2021, and after I instrumented file opens and closes, that every second autofile was ALWAYS closing and reopening. However, this happened even in the case of NO writes.

Proposed solution

Only invoke .closeFile() once we have a write The presumption here is that ONLY 1 process is modifying that file. If more than one is modifying that file, then we shall have data corruption either way unless there is strict file locking.

Investigatioon

To instrument and show the impact the instrumentation of the Go standard library was as follows

```diff diff --git a/src/os/file.go b/src/os/file.go index e717f171e7..5fcdffe956 100644 --- a/src/os/file.go +++ b/src/os/file.go @@ -48,6 +48,7 @@ import ( "io" "io/fs" "runtime" + "sync/atomic" "syscall" "time" "unsafe" @@ -327,13 +328,51 @@ func Create(name string) (*File, error) { return OpenFile(name, O_RDWR|O_CREATE|O_TRUNC, 0666) } +var openFiles, closedFiles uint64 + +func init() { + if debugging { + go counterRoutine() + } +} + +func counterRoutine() { + ticker := time.NewTicker(2 * time.Second) + defer ticker.Stop() + + var lastOpen, lastClosed uint64 + + for { + t1 := time.Now() + <-ticker.C + spentTime := time.Since(t1) + curOpen := atomic.LoadUint64(&openFiles) + curClosed := atomic.LoadUint64(&closedFiles) + println("\033[34mOpen", float64(curOpen-lastOpen)/float64(spentTime/time.Second), "qps", "Closed", float64(curClosed-lastClosed)/float64(spentTime/time.Second), "qps\033[00m") + lastClosed, lastOpen = curClosed, curOpen + } +} + // OpenFile is the generalized open call; most users will use Open // or Create instead. It opens the named file with specified flag // (O_RDONLY etc.). If the file does not exist, and the O_CREATE flag // is passed, it is created with mode perm (before umask). If successful, // methods on the returned File can be used for I/O. // If there is an error, it will be of type *PathError. -func OpenFile(name string, flag int, perm FileMode) (*File, error) { +func OpenFile(name string, flag int, perm FileMode) (fl *File, uerr error) { + if debugging { + pc, srcFile, lineno, _ := runtime.Caller(2) + fn := runtime.FuncForPC(pc) + defer func() { + if uerr == nil { + atomic.AddUint64(&openFiles, 1) + } + if fl != nil { + println("\033[33mOpenFile.file:", fl, srcFile, fn.Name(), lineno, "\033[00m") + } + }() + } + testlog.Open(name) f, err := openFileNolog(name, flag, perm) if err != nil { diff --git a/src/os/file_unix.go b/src/os/file_unix.go index a2531b9656..6a7c874acc 100644 --- a/src/os/file_unix.go +++ b/src/os/file_unix.go @@ -11,6 +11,7 @@ import ( "internal/poll" "internal/syscall/unix" "runtime" + "sync/atomic" "syscall" ) @@ -241,7 +242,21 @@ func openFileNolog(name string, flag int, perm FileMode) (*File, error) { return newFile(uintptr(r), name, kindOpenFile), nil } -func (file *file) close() error { +var debugging = Getenv("DEBUG_FILE") != "" + +func (file *file) close() (uerr error) { + if debugging { + pc, srcFile, lineno, _ := runtime.Caller(2) + fn := runtime.FuncForPC(pc) + println("\033[32mfile.close", file, srcFile, fn.Name(), lineno, "\033[00m") + + defer func() { + if uerr == nil { + atomic.AddUint64(&closedFiles, 1) + } + }() + } + if file == nil { return syscall.EINVAL } ```

Description

When I looked at instrumented logs between, you'll notice a reduction in opens/closes in the after.

Before

1:20AM INF committed state app_hash=7727D9FA3AB8880907B5D4705D2DCEF7FE59B72D303F5DCAE510160A9A224857 height=1825 module=state num_txs=0 server=node
1:20AM INF indexed block height=1825 module=txindex server=node
file.close 0xc0024c1500 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 123 
OpenFile.file: 0xc001a2e890 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Write 135 
file.close 0xc0019dd800 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 123 
OpenFile.file: 0xc003ec7200 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Size 184 
OpenFile.file: 0xc003ec7208 github.com/tendermint/tendermint@v0.34.14/libs/autofile/group.go github.com/tendermint/tendermint/libs/autofile.(*Group).readGroupInfo 363 
file.close 0xc002552900 github.com/tendermint/tendermint@v0.34.14/libs/autofile/group.go github.com/tendermint/tendermint/libs/autofile.(*Group).readGroupInfo 410 
file.close 0xc0025528a0 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 123 
OpenFile.file: 0xc003ec7218 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Sync 153 
file.close 0xc002552960 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 123 
OpenFile.file: 0xc003ec7220 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Sync 153 
1:20AM INF Timed out dur=4973.236 height=1826 module=consensus round=0 server=node step=1

After

1:52AM INF indexed block height=377 module=txindex server=node
file.close 0xc0037d3260 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 139 
OpenFile.file: 0xc0033241b0 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Write 157 
file.close 0xc0032d1c80 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).closeFile 139 
OpenFile.file: 0xc0033241d8 github.com/tendermint/tendermint@v0.34.14/libs/autofile/autofile.go github.com/tendermint/tendermint/libs/autofile.(*AutoFile).Sync 175 
OpenFile.file: 0xc0033241e0 github.com/tendermint/tendermint@v0.34.14/libs/autofile/group.go github.com/tendermint/tendermint/libs/autofile.(*Group).readGroupInfo 363 
file.close 0xc0032d1da0 github.com/tendermint/tendermint@v0.34.14/libs/autofile/group.go github.com/tendermint/tendermint/libs/autofile.(*Group).readGroupInfo 410 
1:52AM INF Timed out dur=4974.865 height=378 module=consensus round=0 server=node step=1
moul commented 1 year ago

@peter7891 can you add this to the upcoming performance improvement framework, please?