Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.42k stars 115 forks source link

[Colossus] Handle temp file cleanup on failed upload #4773

Closed mnaamani closed 1 year ago

mnaamani commented 1 year ago

It is observed that when storage node handling a file upload behind a reverse proxy with a max request_body size configured lower than the asset size, the syncing node doesn't appear to handle the network disconnection correctly and is not cleaning up temporary file.

This resulted in situation where several upload attempts exploded the size of the temp download folder or multiple nodes, until the remote nodes were reconfigured to increase the limit.

The fix for storage node should be able to handle the network disconnection correctly.

We should also investigate if a similar case might occur during sync operating if the reverse proxy applies any limit on the response size.

mnaamani commented 1 year ago

Notes: We can confirm the issue is with file uploads because of the names of the files in temp folder. They contain no hyphens, whereas the temp files during sync contain hyphens.

I'm able to replicate the failure on local test:

We observe error in joystream-cli correctly getting an HTTP 413 error from the caddy proxy

Chosen storage node endpoint: http://172.21.0.1:8335/api/v1
Uploading object 0 (/root/joystream/tests/network-tests/data/joystream-testing/48c87bde-fad2-4dbf-a0cf-d8d71d1bf7c0/fb77ad42-d3a6-4df8-a76d-58cde4e41d96.bmp)
Chosen storage node endpoint: http://172.21.0.1:8335/api/v1
Uploading object 1 (/root/joystream/tests/network-tests/data/joystream-testing/48c87bde-fad2-4dbf-a0cf-d8d71d1bf7c0/63079b2c-b6cc-41f8-ac21-65b039c1ad56.bmp)
Uploading /root/joystream/tests/network-tests/data/joystream-testing/48c87bde-fad2-4dbf-a0cf-d8d71d1bf7c0/fb77ad42-d3a6-4df8-a76d-58cde4e41d96.bmp | =====----------------------------------- | 384/2813
Uploading /root/joystream/tests/network-tests/data/joystream-testing/48c87bde-fad2-4dbf-a0cf-d8d71d1bf7c0/63079b2c-b6cc-41f8-ac21-65b039c1ad56.bmp | ======================================== | 264/264 
 ›   Warning: Upload of object 0 failed: Unexpected error when trying to upload
 ›    a file: Request failed with status code 413
 ›   Warning: Upload of object 1 failed: Unexpected error when trying to upload
 ›    a file: Request failed with status code 413
 ›   Warning: Some assets were not uploaded successfully. Try reuploading them 
 ›   with content:reuploadAssets!
Rejected content ids successfully saved to: /root/joystream/tests/network-tests/data/joystream-testing/48c87bde-fad2-4dbf-a0cf-d8d71d1bf7c0/9220eb34-1e7b-4699-bd0e-f64aaab3badd__rejectedContent.json!

And we find two files in the temp folder location.

We don't see any logs in docker service which suggests the error is the fileUploader middleware.

bedeho commented 1 year ago

Great job sir!

mnaamani commented 1 year ago

I've been focusing on the testing how express and multer middleware function and there is clearly something lacking. There is some error handling but it seems to only be for issues writing to disk. https://github.com/expressjs/multer/blob/master/storage/disk.js

The real problem is that there is no handling for cases when the expressjs Request object might emit event to indicate the underlying HTTP connection timed out or in the particular case we are testing for a TCP connection reset perhaps.

Given the problem is in the multer package https://github.com/expressjs/multer/blob/master/lib/make-middleware.js I will try to see if there is a way to work around.

mnaamani commented 1 year ago

Strangle the events emitted by the request object are not documented, but I found that there is indeed an 'aborted' event that can be leveraged. Played around with monkey patching the builtin disk implementation, but eventually it was too ugly so decided to copy the implementation (with modification for typescript).

mnaamani commented 1 year ago

Looks like this has been an issue on the repo for quite some time, only just found it: https://github.com/expressjs/multer/issues/1116#issuecomment-1175041608 The solution is pretty much what I ended up doing.. https://github.com/expressjs/multer/issues/259#issuecomment-858091529