Open mikelambert opened 4 days ago
The following does reproduce the issue instantly:
from threading import Thread
from botocore.utils import JSONFileCache
cache = JSONFileCache()
def f():
for i in range(100000):
cache["key"] = 10
cache["key"]
threads = []
for i in range(2):
thread = Thread(target=f, name=f"Thread-{i}")
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
Adding a lock in each of __getitem__
, __delitem__
, __setitem__
as you suggested does appear to solve the issue.
Thanks for reaching out. The error you referenced was also reported here: https://github.com/boto/botocore/issues/3106. I was advised that an error message like this could help clarify the behavior here: https://github.com/boto/botocore/pull/3183/files. Does deleting the cache file fix this for you?
Yeah wrapping this in a retry in our app code does work (and is what I did in parallel to filing this bug). Apologies for not realizing it was previously reported. And thank you Laurent for the multi-threaded repro! (Our problematic setup is multi-process, which is why I didn't propose in-process locks)
Describe the bug
I am getting the a rare crash when using botocore to load files.
It's a
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
stemming fromJSONFileCache.__getitem__
, which looks to imply the json file is empty (didn't find any value at char 0). Someone helped point out it might be a race condition in theJSONFileCache.__set__
, which appears to do:We have multiple concurrent processes starting on a box that each are using botocore, so maybe this is just a race condition if one of them happens to look at the file post-truncate-pre-write? Not sure if a flock, or write-then-rename, or something else ends up a proper solution here?
Expected Behavior
It shouldn't crash
Current Behavior
Reproduction Steps
Hard to repro, and haven't tried myself. I assume thousands of processes recreating the botocore cached credentials file would do it.
Possible Solution
Perhaps a flock or a write-to-temp-file-then-rename-to-destination-file-address, instead of truncate-then-write?
Additional Information/Context
No response
SDK version used
1.34.42
Environment details (OS name and version, etc.)
AWS instance running x86_64 GNU/Linux