fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1k stars 354 forks source link

SMB: errors ignored when connecting #1571

Open frafra opened 5 months ago

frafra commented 5 months ago

The system tries to connect 5 times in a row, ignoring all errors. Since the for loop has no else branch, the function does not raise any error when all tentative failed.

https://github.com/fsspec/filesystem_spec/blob/05e7d80ab2affaa01505ff2602b0097c38ad7688/fsspec/implementations/smb.py#L123-L136

martindurant commented 5 months ago

You are quite right! Would you like to implement a fix?

frafra commented 5 months ago

I would suggest dropping the multiple tentative: if the connection is unstable, it should be up to the user to rely on some automatic retry mechanism such as tenacity or backoff. What is your opinion on that?

martindurant commented 5 months ago

I don't know what the user's workflow might be or how they would apply those tools, but in our own CI pipelines, SMB is unstable for some reason, and this loop makes tests fail less often.

frafra commented 5 months ago

Since it is a problem with how the CI is configured, I would suggest applying such workaround to the CI only,

martindurant commented 5 months ago

Plenty of other implementations have retry loops for connection errors, why do you think it's a problem here?

frafra commented 5 months ago

Well, I am not familiar with SMB, but generally a failing connection hints at some underlying issues. If SMB is intrinsically unstable, it would be nice to have a reference to in the code (why wait 0.1 seconds? why trying just 5 times?). Automatic retry can be a feature whenever it is needed, but the developer should be in control of that; otherwise one could end up with a connection which usually takes 5 tentative, and sometimes 6, so it would apparently fail randomly without even having a warning logged by the library.

Is SMB supposed to randomly fail? Is it a documented behavior? Or is it a common misconfiguration or glitch that the developer/user should be made aware of?

GalLadislav commented 2 months ago

Hi, i would add to this discussion with my problem. In my environment there are set login restrictions that lock my account after 3 or 5 invalid logins. One day everything worked good, but another for some reason i started seeing: smbprotocol.exceptions.SMBAuthenticationError: Failed to authenticate with server: SpnegoError (1): SpnegoError (7): Major (458752): No credentials were supplied, or the credentials were unavailable or inaccessible, Minor (2529639053): No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_1000), Context: Processing security token, Context: Unable to negotiate common mechanism This gave me a headacke because SMBFIleSystem had credentials set. A whole day it took me to find out that SMBFileSystem._connect() not connected, thus leading to the exception above. After passing credentials directly to a called method (listdir) it raised another exception with error 0xC0000234 (STATUS_ACCOUNT_LOCKED_OUT).

I somewhere during my development put wrong password and this for loop did not properly raised an exception and instead repeated the login 5 times, thus locking my account.

I would agree with @frafra that the retry should by implemented on users/dev side. Another option is to properly handle excepted errors and repeating only on those that indicate connection instability or add init parameter that sets the retry limit.

martindurant commented 2 months ago

I would be very happy to see a method in the retry block, which can recognise/differentiate between situations that are permanent (bad creds) versus ones that should be retried.

In the meantime, passing a parameter for the maximum number of retries to attempt sounds very reasonable. Would you be willing to implement this?

GalLadislav commented 2 months ago

Yes, i can do both. I will issue a PR where we can discuss the actual implementation.

edit: I can see there is already option to set number of retries implemented since this issue, so ill focus on exception handling.