caddyserver / certmagic

Automatic HTTPS for any Go program: fully-managed TLS certificate issuance and renewal
https://pkg.go.dev/github.com/caddyserver/certmagic?tab=doc
Apache License 2.0
4.89k stars 278 forks source link

Do calls to storage Load need to be Lock protected? #296

Open elee1766 opened 2 weeks ago

elee1766 commented 2 weeks ago

by contract, do calls to load need to be protected by the storage lock?

from what I see, the lock is used only to protect for simultaneous writes. it's not used to lock/protect the reader if a writer is currently writing.

is the caller supposed to take the Lock before reading the certificate? if not, are storage Load implementations supposed to protect for concurrent read/write access?

elee1766 commented 2 weeks ago

Filestore uses os.WriteFile

// WriteFile writes data to the named file, creating it if necessary. 

// If the file does not exist, WriteFile creates it with permissions perm (before umask); 

// otherwise WriteFile truncates it before writing, without changing permissions. 

// Since WriteFile requires multiple system calls to complete, a failure mid-operation 

// can leave the file in a partially written state. 

func WriteFile(name string, data []byte, perm FileMode) error { 

    f, err := OpenFile(name, O_WRONLY|O_CREATE|O_TRUNC, perm) 

    if err != nil { 

        return err 

    } 

    _, err = f.Write(data) 

    if err1 := f.Close(); err1 != nil && err == nil { 

        err = err1 

    } 

    return err 

}

if reader reads while writefile is running. it could read after open file, which truncates the file, so it will read empty.

so I think there is a race and protection is needed.

within a process, read will force page cache flush if read is run after write before close, however between multiple processes. the file is empty from the opening of the file to fsync on close.

within the process, an rwmutex is needed likely. however between multiple processes, it may be prudent to require a global lock on read. if so, it may be correct to switch to a locking system with rw support in the long term (advisory lock, etc)

elee1766 commented 1 week ago

https://github.com/containerd/containerd/blob/main/pkg%2Fatomicfile%2Ffile.go maybe can do what containerd does

mholt commented 1 week ago

by contract, do calls to load need to be protected by the storage lock?

No; this synchronization should be handled primitively, natively/internally, within the storage mechanism. The user shouldn't have to lock for every other storage operation.

from what I see, the lock is used only to protect for simultaneous writes. it's not used to lock/protect the reader if a writer is currently writing.

Locking is for user-scope purposes, like renewing certificates or something. So it can protect reads and writes, but what it's really for is to sync a certain operation or task that may involve multiple reads or writes.

if reader reads while writefile is running. it could read after open file, which truncates the file, so it will read empty.

It sounds like you know a lot more about the primitives of file systems than I do :sweat_smile: Especially after our conversation in slack. We can implement improvements to the FileStorage if needed! Is it a matter of calling Sync() in more places?

elee1766 commented 1 week ago

by contract, do calls to load need to be protected by the storage lock?

No; this synchronization should be handled primitively, natively/internally, within the storage mechanism. The user shouldn't have to lock for every other storage operation.

from what I see, the lock is used only to protect for simultaneous writes. it's not used to lock/protect the reader if a writer is currently writing.

Locking is for user-scope purposes, like renewing certificates or something. So it can protect reads and writes, but what it's really for is to sync a certain operation or task that may involve multiple reads or writes.

if reader reads while writefile is running. it could read after open file, which truncates the file, so it will read empty.

It sounds like you know a lot more about the primitives of file systems than I do 😅 Especially after our conversation in slack. We can implement improvements to the FileStorage if needed! Is it a matter of calling Sync() in more places?

i think the easiest solution is to do what containerd is doing.

instead of writing over a file, we should write a new file and attempt to copy over it.

i will try to replicate the race condition, create a fix, and show that the race is resolved.

the comment in their code describes:

Package atomicfile provides a mechanism (on Unix-like platforms) to present a consistent view of a file to separate
processes even while the file is being written.  This is accomplished by writing a temporary file, syncing to disk, and
renaming over the destination file name.

Partial/inconsistent reads can occur due to:
 1. A process attempting to read the file while it is being written to (both in the case of a new file with a
    short/incomplete write or in the case of an existing, updated file where new bytes may be written at the beginning
    but old bytes may still be present after).
 2. Concurrent goroutines leading to multiple active writers of the same file.

the lock existing lock can deal with #2, we just need a mechanism to deal with #1

mholt commented 1 day ago

That would be a great help, thank you! :raised_hands: