Large file upload to S3 errors with TimeOut from S3

apmcodes commented 4 years ago

Steps to reproduce

While uploading large files say 10Mb, after few minutes upload fails with the below error from aws-sdk. Smaller files upload just fine.

RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period.

Tried to investigate, but just found this from aws-sdk and it talks about steraming and Content-Length, not sure if the feathers-blob service is setting the length. RequestTimeout: Your socket connection ...

Same issue while using both datauri and buffer.

File

file { buffer:
   <Buffer 22 45 6d 61 69 6c 20 41 64 64 72 65 73 73 22 2c 22 45 6d 61 69 6c 20 46 6f 72 6d 61 74 22 2c 22 43 6f 6e 66 69 72 6d 65 64 22 2c 22 53 75 62 73 63 72 ... >,
  id: 'se32n6rifii5btlxmkefo3c3rgjfbesdjqwg.csv',
  fileName: 'list3.csv',
  fileId: 'se32n6rifii5btlxmkefo3c3rgjfbesdjqwg.csv',
  mimeType: 'text/csv',
  contentType: 'text/csv',
  encoding: '7bit',
  size: 12293411 }

Following the fileupload docs

uploads.service.js

const hooks = require('./uploads.hooks');
const AWS = require('aws-sdk');
const S3blobStorage = require('s3-blob-store');
const BlobService = require('feathers-blob');
const mime = require('mime-types');

const fileFilter = (req, file, cb) => {
    const fileType = mime.lookup(file.originalname);
    if (fileType != "text/csv") cb(null, false);
    cb(null, true)
}

const limits = {
    fileSize: 12582912,
    files: 1,
}

const multer = require('multer');
const multipartMiddleware = multer({ limits: limits, fileFilter: fileFilter });
const uploader = multipartMiddleware.single('upfile');

module.exports = function (app) {

    const cfgS3 = app.get('s3');
    const s3 = new AWS.S3({
        endpoint: cfgS3.url,
        accessKeyId: cfgS3.accessKeyId,
        secretAccessKey: cfgS3.secretAccessKey,
    });

    const blobStorage = S3blobStorage({
        client: s3,
        bucket: cfgS3.bucket
    });

    // Initialize our service with any options it requires
    app.use('/uploads',
        uploader,
        // another middleware, to transfer the received file to feathers
        function (req, res, next) {
            req.feathers.file = req.file;
            next();
        },
        BlobService({ Model: blobStorage }));

    // Get our initialized service so that we can register hooks
    const service = app.service('uploads');

    service.hooks(hooks);
};

System configuration

node v10.13.0

    "@feathersjs/express": "^4.3.10",
    "@feathersjs/feathers": "^4.3.10",
    "@feathersjs/socketio": "^4.3.10",
    "aws-sdk": "^2.568.0",

claustres commented 4 years ago

Pretty strange, just added this test https://github.com/feathersjs-ecosystem/feathers-blob/blob/master/test/s3.test.js#L94 with a 20MB file and it seems to run fine either on Travis or my PC. Are you behind a proxy or something like that ?

apmcodes commented 4 years ago

Thanks @claustres

Yes, strange that the error occured few times, will test more and report.

apmcodes commented 4 years ago

Tested in production worked fine with 12Mb, so in dev seems slow/ connectivity issue.

Just one question though, is error hook is the right place to catch errors that happens in the BlobService? Also is there a way to know if upload has stuck and can we set any timeout?

claustres commented 4 years ago

Usually error hooks are a good place to catch errors in the following use cases:

on the backend at service level if you don't want to send back the error to the client
on the client at app level to display an error message about any failing operation

Otherwise you can also simply do a try/catch block on your service call.

In feathers there is no timeout on the backend but you can configure one on the client if you'd like: eg https://docs.feathersjs.com/api/client/socketio.html#socketio-socket-options. Don't know if timeout is configurable in AWS SDK.

apmcodes commented 4 years ago

Thank you very much for providing the various options in detail.

eikaramba commented 4 years ago

not sure if i can help out but i had a similar problem currently with uploading big files over socket io in feathers. Basically you need to adjust the socket timeout on the client as the documentation describes. However i also needed to do this: .configure(socketio({ pingTimeout:1200000 })) on the server, otherwise after 20-50s the connection would be closed because the heartbeat of the socket is somehow not aknowledged by the client if he is busy uploading and thus the server is closing the connection(as he thinks the client is lost). maybe it helps

eikaramba commented 4 years ago

however currently uploading big files with feathers is still a pain in the ass. especially if you use socket.io - i still haven't figured out a way to ideally send a buffer instead of datauri, then use caching to keep memory low on the server and also provide a progress indication via socket.io.

there's https://medium.com/@Mewsse/file-upload-with-socket-io-9d2d1229494 but i don't know yet how to connect that with the existing feathers infrastructure. well, an issue for another time.

claustres commented 4 years ago

Nothing prevent you to mix socket and standard HTTP with Feathers. You can then use middlewares like https://github.com/expressjs/multer to perform multipart upload for large files.

eikaramba commented 4 years ago

yes i know the "problem" is that it is not that easy to make a multipart upload togehter with feathers-vuex(which is using socket.io). Especially if one wants to use binary mode for better efficiency. Would be nice if feathers-blob would support buffer or streams instead of only datauris.

Don't get me wrong in another project i am using multer successfully, but it is just using plain old rest client with no fancy things like reactive stores (which are currently abstracting away all the upload mechanics for me)

claustres commented 4 years ago

You can send data as a buffer already: https://github.com/feathersjs-ecosystem/feathers-blob/blob/master/test/index.test.js#L57. It will be streamed anyway on the socket, the big problem is simply that you need to read it completely in memory first on the client side. Creating chuncks on the client would require specific tooling like https://www.dropzonejs.com/#config-chunking.

Yes for multer you need to use REST so that reactivity is lost. But maybe reactivity on the blob service is not really a good idea. Usually a file is attached to some resource like a post or whatever. This object stores the link to the file and can be managed using a reactive service. This allow to be reactive even before the upload has been finished if required.

eikaramba commented 4 years ago

cool i wasn't aware buffer do already work, then means i don't need to transform to dataURI.

I switched now to REST and use multipart. The whole "let's send some big files over socket" is too much work and hassle. :)

feathersjs-ecosystem / feathers-blob

Large file upload to S3 errors with TimeOut from S3 #67

Steps to reproduce

System configuration