StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link

Mounted minio directory failing when copying many files #236

Closed ca-scribner closed 2 years ago

ca-scribner commented 4 years ago

When doing speed tests on copying with minio vs attached disk, found Input/output error during copy action when working with >100's of files

speed test copied n files in the following ways:

Found that minio -> local or minio -> minio work well with small numbers of files (n<~100) but breaks most times when n>200, giving Input/output error from the copy action (eg: the source file for the copy action is not available).

After process fails, I can see:

To reproduce, you can use the code here

ca-scribner commented 4 years ago

Update: this just happened to me when copying a few large files too. Seems like it happens intermittently and its just easier to find with many small files?

ca-scribner commented 3 years ago

Assigned to saffa more for investigation. Might need someone else to support/fix

saffaalvi commented 3 years ago

local -> minio

minio -> local

minio -> minio

After an error appears in r-studio, the whole program starts to slow down all processes (had to delete and create a new notebook server).

saffaalvi commented 3 years ago

From the above testing, it looks like there's no issue with minio itself, but instead with the mounting. For now, the best way to copy files between local and minio is to use the mc cp <file> <destination> command.

saffaalvi commented 3 years ago

USING MINIMAL-TENANT-1

local -> minio *When you click paste, the rest of the notebook server becomes unresponsive until either it has pasted or an error occurs

JupyterLab-CPU:

R-Studio:

minio -> local

JupyterLab-CPU:

R-Studio

sylus commented 3 years ago

@saffaalvi was this done against the new tenants?

sylus commented 3 years ago

Can the tests also be performed with MC as copy/pasting in the UI, means now the browser is coordinating things but need to look into more ^_^

saffaalvi commented 3 years ago

@sylus The testing still isn't completed but I noticed the mc cp behaviour was pretty similar to the last time I tested it, it copied over successfully and quickly. This was done with minimal-tenant-1, should I be trying it with standard-tenant-1?

saffaalvi commented 3 years ago

USING STANDARD-TENANT-1

local -> minio *When you click paste, the rest of the notebook server becomes unresponsive until either it has pasted or an error occurs

minio -> local

minio -> minio

saffaalvi commented 3 years ago

Summary of standard-tenant-1 results compared to testing with minimal-tenant-1 from Nov. 19, 2020: local -> minio:

minio -> local:

minio -> minio

Results do seem to be a little inconsistent, I would try the same process a few times and get different results as recorded above. @ca-scribner noticed this too when trying to use mc cp with the 1GB file.

ca-scribner commented 3 years ago

To clarify, the JupyterLab-CPU/rstudio entries are for copy/pasting in the respective file browser, and mc cp is for the terminal command?

On Fri, Dec 18, 2020 at 15:01 Saffa Alvi notifications@github.com wrote:

Summary of standard-tenant-1 results compared to testing from Nov. 19, 2020: local -> minio:

  • JupyterLab-CPU: Handled the 150 MB file better than last time since there was no error, but still took over 2 minutes to copy. Still unable to handle the 1 GB file, server still crashed with error.
  • R-Studio: Last time, breaking point was any file larger than 101MB, but this time, was able to copy over a 150 MB file. Still unable to handle the 1 GB file, server still crashed with error.
  • Using mc cp: Both notebook servers behaved like last time, but were maybe a little faster.

minio -> local:

  • JupyterLab-CPU: Handles the 1 GB file better than last time, but still took long to copy over (48 seconds)
  • R-Studio: Same results as last time
  • Using mc cp: For some reason, copying over the 1 GB file now took WAY longer when last time, it only took 5-8 seconds. The file hadn’t copied over in both notebook servers even after 20 minutes.

minio -> minio

  • JupyterLab-CPU: Got an error for the 1 GB file much later than last time, but it was still unable to copy over.
  • R-Studio: Same response as last time.
  • Using mc cp: Quicker than last time.

Results do seem to be a little inconsistent, I would try the same process a few times and get different results as recorded above. @ca-scribner https://github.com/ca-scribner noticed this too when trying to use mc cp with the 1GB file.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/StatCan/daaas/issues/236#issuecomment-748292664, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPFPI4SO3OYBAXXNOR6JULSVOYLDANCNFSM4R7MN6BA .

saffaalvi commented 3 years ago

@ca-scribner yes, mc cp also has JupyterLab-CPU/R-Studio entries below it to show which notebook server the terminal command was done in and the results.

sylus commented 3 years ago

Thanks @saffaalvi for the information you provided is super helpful!

sylus commented 3 years ago

Work proceeding over at https://github.com/StatCan/daaas/issues/348