eyra / mono

Next platform repo
https://eyra.co
GNU Affero General Public License v3.0
7 stars 4 forks source link

Downloading of data from storage is relatively slow #909

Closed TjerkNan closed 2 months ago

TjerkNan commented 2 months ago

Describe the bug Downloading a large number 250+ files of 10KB in size is processed at ~170KB/s. Downloading a small dataset in terms of storage usage can still take several minutes or even longer.

To Reproduce Steps to reproduce the behavior:

  1. store 250+ files in the appropriate S3 location of the project
  2. Login as creator and download the dataset
  3. Observe the speed

Expected behavior Although there are a lot of files, the storage volume is low and with a gigabit connection, this should download in seconds.

Screenshots https://github.com/eyra/mono/assets/88683839/85b777cd-e927-4be4-86f9-80d431b41790

Desktop (please complete the following information):

mellelieuwes commented 2 months ago

@TjerkNan ik heb geen invloed op de snelheid van downloaden. Dat lijkt me eerder een omgevings issue dan een software issue.

Als ik lokaal Next run op de dev S3 gaat het bloedje snel met 20+ files.

Heb je dit alleen met veel files en niet met grote files?

mellelieuwes commented 2 months ago

Ik heb dit issue gevonden in de Packmatic lib. Dit lijkt hetzelfde probleem.

https://github.com/evadne/packmatic/issues/13

Ik zal er in duiken

mellelieuwes commented 2 months ago

@TjerkNan Using connection pooling makes download twice as fast but it is still slow. This is caused by overhead costs of communicating with S3 for every single file. We can only make this less frustrating for the user by changing the UI and be very clear about being a little bit patient. In the future we might change to having download links that are prepared in the background. For now we can keep it as is and experience the it with the first pilots.

TjerkNan commented 2 months ago

@emielvdveen course of action sounds totally fine to me. As long as people know what to expect, they are fine.

TjerkNan commented 2 months ago

Speed is now around 3+MB/s and downloading 1000 files of 200K is totally fine, just takes a few minutes.