The problem is that the file-loader uses the key of the intial file as the path of the file in it's local cache. This can run into the following sequence when multiple workers are using the same file:
Worker A requires file a
file loader downloads the file and stores a copy in ${cache_dir}/a. It returns an Arc to the worker that will delete the file when it's dropped
Worker A is shutting down and drops the last Arc for file a. This removes it from the cache as weak.upgrade is now going to fail. The file is not yet deleted.
Worker B requires file a
file loader checks the cache and doesn't find it. It creates a new cache entry and starts downloading the file again to ${cache_dir}/a
Drop for the entry owned by Worker A runs, the underlying file ${cache_dir}/a is deleted while Worker B is still using it
Boom
I resolved this in this pr by additionally having a counter in the file_loader and just giving each locally created file a unique id, instead of using the key as the name. This way worker B will just create a new file in the above scenario while worker A deletes the old one.
Ran the integration_test suite 10 times locally to check that it works fine, hopefully the tests are not flaky now.
I saw the test failure on master and looked into the cause.
The problem is that the file-loader uses the key of the intial file as the path of the file in it's local cache. This can run into the following sequence when multiple workers are using the same file:
I resolved this in this pr by additionally having a counter in the file_loader and just giving each locally created file a unique id, instead of using the key as the name. This way worker B will just create a new file in the above scenario while worker A deletes the old one.
Ran the integration_test suite 10 times locally to check that it works fine, hopefully the tests are not flaky now.