drakkan / sftpgo

Full-featured and highly configurable SFTP, HTTP/S, FTP/S and WebDAV server - S3, Google Cloud Storage, Azure Blob
https://sftpgo.com
GNU Affero General Public License v3.0
9.54k stars 743 forks source link

[Bug]: generic error: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 3, https response error StatusCode: 0 #1765

Closed rc-networks closed 2 months ago

rc-networks commented 2 months ago

⚠️ This issue respects the following points: ⚠️

Bug description

Let me clarify the points above first. I am not sure if this is a bug or a config issue. I've already search similar reports and there are no working solution right now.

We are currently testing if this app to see if it is a viable solution to replace our existing NAS Web UI.

This is tested both for AWS S3 and CEPH. Both have the same problem and log saying: check the Relevant log output part below.

Both options tested with and without the path-style addressing option. Both options are working with AWS CLI, MinIO Clients and curl which are run on the same kubernetes namespace on a different pod, which clears any networking issue.

Now for the current build. This is deployed via Helm charts that was found here TLS are signed, added via kubernetes secrets and mounted under /etc/ssl/certs/ as shown below:

sftpgo@sftpgo-test-75d7d6b7bb-2x86c:~$ ls -al /etc/ssl/certs/
total 0
drwxrwsrwt 3 root sftpgo 100 Sep 25 09:04 .
drwxr-xr-x 4 root root    53 Jul 15  2023 ..
drwxr-sr-x 2 root sftpgo  60 Sep 25 09:04 ..2024_09_25_09_04_28.1439769053
lrwxrwxrwx 1 root sftpgo  32 Sep 25 09:04 ..data -> ..2024_09_25_09_04_28.1439769053
lrwxrwxrwx 1 root sftpgo  22 Sep 25 09:04 ssl-certificate -> ..data/ssl-certificate

Any idea what might be causing this issue?

Steps to reproduce

  1. Install via helm from the DOCs that was found here
  2. Apply the kubernetes secrets and mount them on /etc/ssl/certs/
  3. (Assuming you already have an ingress or a gateway configure) Sign in via the web ui configure the folders and users.
  4. Login as a user and test the connections and check the logs

Expected behavior

Should be able to connect to any S3 service (both CEPH S3 and AWS S3 in this case) and view the contents.

SFTPGo version

SFTPGo 2.5.4-cc381443-2023-07-15T08:00:33Z +metrics +azblob +gcs +s3 +bolt +mysql +pgsql +sqlite +unixcrypt +portable

Data provider

CEPH S3 and AWS S3

Installation method

Other

Configuration

Configs are in defaults from the helm chart from the DOCs that was found here

Relevant log output

CEPH S3 endpoint:

{"level":"debug","time":"2024-09-26T09:30:00.429","sender":"HTTP","connection_id":"HTTP_crqiivjc827uhfbfuip0","message":"error listing directory: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get \"https://radosgw.company.internal.domain/mc-ldap-bucket?delimiter=%2F&list-type=2&prefix=\": dial tcp 172.16.48.11:443: i/o timeout"}
{"level":"error","time":"2024-09-26T09:30:00.429","sender":"HTTP","connection_id":"HTTP_crqiivjc827uhfbfuip0","message":"generic error: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get \"https://radosgw.radosgw.company.internal.domain/mc-ldap-bucket?delimiter=%2F&list-type=2&prefix=\": dial tcp 172.16.48.11:443: i/o timeout"}
{"level":"debug","time":"2024-09-26T09:30:00.430","sender":"HTTP","connection_id":"HTTP_crqiivjc827uhfbfuip0","message":"connection removed, local address \"172.25.38.48:8080\", remote address \"172.25.39.106:43691\" close fs error: <nil>, num open connections: 0"}

AWS S3 endpoint:

{"level":"debug","time":"2024-09-26T09:50:33.768","sender":"HTTP","connection_id":"HTTP_crqisk3c827uhfbfujf0","message":"error listing directory: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get \"https://<omitted_for_security_reasons>.s3.ap-northeast-1.amazonaws.com/?delimiter=%2F&list-type=2&prefix=\": dial tcp 3.5.156.179:443: i/o timeout"}
{"level":"error","time":"2024-09-26T09:50:33.768","sender":"HTTP","connection_id":"HTTP_crqisk3c827uhfbfujf0","message":"generic error: operation error S3: ListObjectsV2, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get \"https://<omitted_for_security_reasons>.s3.ap-northeast-1.amazonaws.com/?delimiter=%2F&list-type=2&prefix=\": dial tcp 3.5.156.179:443: i/o timeout"}
{"level":"debug","time":"2024-09-26T09:50:33.768","sender":"HTTP","connection_id":"HTTP_crqisk3c827uhfbfujf0","message":"connection removed, local address \"172.25.38.48:8080\", remote address \"172.25.39.106:38919\" close fs error: <nil>, num open connections: 0"}


### What are you using SFTPGo for?

Enterprise

### Additional info

_No response_
drakkan commented 2 months ago

Hello I appreciate your honest response to the question "What are you using SFTPGo for?" and so try to provide some help.

In both the cases the error is an I/O timeout: dial tcp 3.5.156.179:443: i/o timeout. It looks like that your pods are unable to connect to your configured storage backends, I suggest to check the network connectivity between the pods and AWS/Ceph. Getting started with a VM is generally easier than with K8s: you can test with a single VM and migrate to Kubernetes later if SFTPGo meets your requirements.

rc-networks commented 2 months ago

Thanks for that recommendation on testing on a VM. Installed on a Ubuntu machine. noticed the version 2.6.2 was installed (I tested anyway) and it worked. Updated the current deployment on kubernetes to "v2.6.2-alpine" and it worked without issues.

I should have checked the latest version before posting.

I'll have the endusers test this out and see how it goes. Thanks again!