Open GuichardVictor opened 5 months ago
Since you are talking about "jobs", do you mean you have threads running?
Agree, an syncio lock would be reasonable. The only await
is at the call for s3creator
, but load_credentials()
happens earlier. I wonder if there's an asyncio clean way to say: "I have to await this, but I don't really want to yield control". Otherwise, we need to pass this new Lock object around.
On my experiment it was only multiple asyncio futures that was running (using the batch size argument of the async file system of fsspec). I have added a asyncio.Lock
and it seemed to fixed the problem. That's why I opened the issue.
You are welcome to propose that change as a PR
Sure, I will open a PR this weekend.
Most functions call
_s3_call
which callsset_session
. The function goal is to "connect" to aws.In the context of multiple jobs (or batch_size > 1), each jobs will race to create the session despite the check if the session was already created.
Using process authentification, assume role and source_profile in the aws config, loading the credentials will fail with InfiniteLoopConfigError which can be replicated with the following example:
where profile depends on an other profile.
One way to fix this is to use
asyncio.Lock
when creating the session to ensure that only one job will create the session.I'm open to contribute if this is something we want to fix.