drakkan / sftpgo

Full-featured and highly configurable SFTP, HTTP/S, FTP/S and WebDAV server - S3, Google Cloud Storage, Azure Blob
https://sftpgo.com
GNU Affero General Public License v3.0
9.29k stars 726 forks source link

Duplicate Upload Actions for same file #995

Closed JoeAtThru closed 1 year ago

JoeAtThru commented 2 years ago

Hi!

I'm trying to investigate an issue where we are seeing two 'upload' actions get dispatched for the same file. The payload of the request contains identical data - including the same session Id. image image

For this file, we only got 1 'pre-upload' action. The payload in both upload actions indicate that the file was successfully uploaded with the correct file size.

I seem to notice this issue more when the system is under load via JMeter. We are using the default DB (SQLite) and I have seen errors related to SQLite 'context deadline exceeded'. We have considered moving to either the in-memory or Bolt DB. I'm not sure if the duplicate hooks getting dispatched could be related to the DB being under extreme load - perhaps triggering a retry of the upload hook.

Let me know what I can do to help :)

Thanks!

Edit: System Info image Running on Azure "Premium SSD" image

drakkan commented 2 years ago

Hi!

Hi

I'm trying to investigate an issue where we are seeing two 'upload' actions get dispatched for the same file. The payload of the request contains identical data - including the same session Id. image image

For this file, we only got 1 'pre-upload' action. The payload in both upload actions indicate that the file was successfully uploaded with the correct file size.

I seem to notice this issue more when the system is under load via JMeter. We are using the default DB (SQLite) and I have seen errors related to SQLite 'context deadline exceeded'. We have considered moving to either the in-memory or Bolt DB.

before blindly changing data provider and hoping for the best, I suggest figuring out which database queries fail. For example if quota update queries fail you can configure delayed quota updates and so reduce the database load

I'm not sure if the duplicate hooks getting dispatched could be related to the DB being under extreme load - perhaps triggering a retry of the upload hook.

You have to read your logs to understand the reason. SFTPGo has a configurable number of retries. I guess your hook receiver does not reply in time, under heavy load, and this causes a retry. Take a look at your logs for more details.

In SFTPGo v2.3.1 there is no maximum limit on the number of concurrent notifications. If you upgrade to v2.3.5 maximum 150 concurrent notifications are allowed, this may help

Let me know what I can do to help :)

Thanks!

Edit: System Info image Running on Azure "Premium SSD" image

JoeAtThru commented 2 years ago

Thanks for quick reply!

We certainly have some slow responses that I did not see previously and I think we will increase the http timeout to try and address those.

Here are some of the SQLite queries that are timing out. Sorry for the flood, just trying to be complete.

1) User groups mapping error preparing database query "SELECT g.name,ug.group_type,ug.user_id FROM groups g INNER JOIN users_groups_mapping ug ON g.id = ug.group_id WHERE\n\t\tug.user_id IN (427) ORDER BY ug.user_id": context deadline exceeded

2) Users folders mapping error preparing database query "SELECT f.id,f.name,f.path,f.used_quota_size,f.used_quota_files,f.last_quota_update,fm.virtual_path,\n\t\tfm.quota_size,fm.quota_files,fm.user_id,f.filesystem,f.description FROM folders f INNER JOIN users_folders_mapping fm ON f.id = fm.folder_id WHERE\n\t\tfm.user_id IN (422) ORDER BY fm.user_id": context deadline exceeded

3) Last login error preparing database query "UPDATE users SET last_login = ? WHERE username = ?": context deadline exceeded

4) User metrics? error preparing database query "SELECT id,username,password,public_keys,home_dir,uid,gid,max_sessions,quota_size,quota_files,permissions,used_quota_size,used_quota_files,last_quota_update,upload_bandwidth,download_bandwidth,expiration_date,last_login,status,filters,filesystem,additional_info,description,email,created_at,updated_at,upload_data_transfer,download_data_transfer,total_data_transfer,used_upload_data_transfer,used_download_data_transfer FROM users WHERE username = ?": context deadline exceeded

5) check availability error: context deadline exceeded

6) unable to get recently updated users: context deadline exceeded

drakkan commented 2 years ago

Hi,

thanks for the additional info, so basically the database is busy and many queries fail. Do you see frequent logs like quota updated for user XXX ....? If so try to set delayed_quota_update to 120 and check if there are improvements

JoeAtThru commented 2 years ago

After discussing with our team, I think since we are not using the quota, for now we will make "track_quota": 0. Also, in our use case we are not utilizing many of the built in DB functionality (user mapping, last login, etc) I think we will move to in-memory provider. Do you have any recommendations choosing between bolt vs in-memory?

drakkan commented 2 years ago

If you don't need to persist users and their properties (because you are using external auth for example), the memory provider should be ok. Bolt persists users and other objects to a file, so, performance should be similar to SQLite. Let me know if you find any issues, I think not so many users are using the memory provider.

drakkan commented 2 years ago

@JoeAtThru were you able to better understand/fix the issue?