Closed kdembler closed 6 months ago
I had test this and looked in the logs.
Affected storage providers: all
Geo location: all
Client type: all
Observed when this issue happens the reverse proxy receives a reset from the peer i.e. Colossus and respond to the client with 408.
{"level":"error","ts":1708069037.667749,"logger":"http.handlers.reverse_proxy","msg":"reading from backend","error":"read tcp 172.21.0.7:33060->172.21.0.15:3333: read: connection reset by peer"}
{"level":"error","ts":1708069037.6678162,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response"," #error":"read tcp 172.21.0.7:33060->172.21.0.15:3333: read: connection reset by peer"}
Below is samples of the error
This error is the majority, sometimes a company by other errors (included the StorageObject affected)
dataObjectId: 2100687
Internal Server Error: Upload aborted
at error (/joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:123:20)
at /joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:16:26
at Immediate._onImmediate (/joystream/node_modules/multer/lib/make-middleware.js:53:37)
at process.processImmediate (node:internal/timers:478:21)
dataObjectId: 1321165
uncaughtException: Upload aborted
Internal Server Error: Upload aborted
at error (/joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:123:20)
at /joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:16:26
at Immediate._onImmediate (/joystream/node_modules/multer/lib/make-middleware.js:53:37)
at process.processImmediate (node:internal/timers:478:21)
Below a companied by another error (not on the same log)
dataObjectId: 2112736
uncaughtException: Upload aborted
Internal Server Error: Upload aborted
at error (/joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:123:20)
at /joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:16:26
at Immediate._onImmediate (/joystream/node_modules/multer/lib/make-middleware.js:53:37)
at process.processImmediate (node:internal/timers:478:21)
Error: Unexpected end of JSON input
at new ApolloError (/joystream/node_modules/@apollo/client/errors/errors.cjs:34:28)
at /joystream/node_modules/@apollo/client/core/core.cjs:1654:19
at both (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:986:53)
at /joystream/node_modules/@apollo/client/utilities/utilities.cjs:979:72
at new Promise (<anonymous>)
at Object.then (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:979:24)
at Object.error (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:987:49)
at notifySubscription (/joystream/node_modules/zen-observable/lib/Observable.js:140:18)
at onNotify (/joystream/node_modules/zen-observable/lib/Observable.js:179:3)
at SubscriptionObserver.error (/joystream/node_modules/zen-observable/lib/Observable.js:240:7)
dataObjectId: 1713397
uncaughtException: Timeout
Error
at new module.exports.TimeoutError (/joystream/node_modules/promise-timeout/index.js:44:16)
at module.exports.timeout (/joystream/node_modules/promise-timeout/index.js:20:15)
at extrinsicWrapper (/joystream/storage-node/lib/services/runtime/extrinsics.js:207:45)
at acceptPendingDataObjects (/joystream/storage-node/lib/services/runtime/extrinsics.js:115:18)
at uploadFile (/joystream/storage-node/lib/services/webApi/controllers/filesApi.js:99:57)
dataObjectId: 2112361
Error: Server is not accepting uploads into this bucket
at validateUploadFileParams (/joystream/storage-node/lib/services/webApi/app.js:199:15)
at app.use.OpenApiValidator.middleware.apiSpec (/joystream/storage-node/lib/services/webApi/app.js:59:13)
at Layer.handle [as handle_request] (/joystream/node_modules/express/lib/router/layer.js:95:5)
at trim_prefix (/joystream/node_modules/express/lib/router/index.js:317:13)
at /joystream/node_modules/express/lib/router/index.js:284:7
at Function.process_params (/joystream/node_modules/express/lib/router/index.js:335:12)
at next (/joystream/node_modules/express/lib/router/index.js:275:10)
at /joystream/storage-node/lib/services/webApi/app.js:54:9
at Layer.handle [as handle_request] (/joystream/node_modules/express/lib/router/layer.js:95:5)
at trim_prefix (/joystream/node_modules/express/lib/router/index.js:317:13)
dataObjectId: 2112588
Internal Server Error: Upload aborted
at error (/joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:123:20)
at /joystream/node_modules/express-openapi-validator/dist/middlewares/openapi.multipart.js:16:26
at Immediate._onImmediate (/joystream/node_modules/multer/lib/make-middleware.js:53:37)
at process.processImmediate (node:internal/timers:478:21)
Error: File multihash doesn't match the data object's ipfsContentId for data object ID = 2112588
at verifyDataObjectInfo (/joystream/storage-node/lib/services/webApi/controllers/filesApi.js:225:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async uploadFile (/joystream/storage-node/lib/services/webApi/controllers/filesApi.js:109:9)
Other errors seem
Error: Unexpected end of JSON input
at new ApolloError (/joystream/node_modules/@apollo/client/errors/errors.cjs:34:28)
at /joystream/node_modules/@apollo/client/core/core.cjs:1654:19
at both (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:986:53)
at /joystream/node_modules/@apollo/client/utilities/utilities.cjs:979:72
at new Promise (<anonymous>)
at Object.then (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:979:24)
at Object.error (/joystream/node_modules/@apollo/client/utilities/utilities.cjs:987:49)
at notifySubscription (/joystream/node_modules/zen-observable/lib/Observable.js:140:18)
at onNotify (/joystream/node_modules/zen-observable/lib/Observable.js:179:3)
at SubscriptionObserver.error (/joystream/node_modules/zen-observable/lib/Observable.js:240:7)
dataObjectId: 2011371 (Only this)
uncaughtException: File size doesn't match the data object's size for data object ID = 2011371
Error: File size doesn't match the data object's size for data object ID = 2011371
at verifyDataObjectInfo (/joystream/storage-node/lib/services/webApi/controllers/filesApi.js:221:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async uploadFile (/joystream/storage-node/lib/services/webApi/controllers/filesApi.js:109:9)
dataObjectId: 2005104
uncaughtException: Data object 2005104 already exists
Error: Data object 2005104 already exists
at validateUploadFileParams (/joystream/storage-node/lib/services/webApi/app.js:216:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Client sends a multipart/form-data request: This is a type of HTTP request that's used for sending binary data in the message body. It's commonly used for uploading files.
Request reaches the server: The server receives the request and passes it to the Express.js middleware pipeline.
Request hits the OpenApiValidator.middleware: This middleware is configured to validate incoming requests against an OpenAPI specification. The OpenAPI specification is a standard, language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection.
Middleware validates the request: The middleware checks if the request matches the paths, methods, and parameters defined in the OpenAPI specification. If the request doesn't match the specification, the middleware will return an error response.
Middleware checks if the request includes a file upload: If the request method, path, and parameters are valid, the middleware checks if the request includes a file upload. This is determined by the Content-Type header of the request, which should be multipart/form-data for file uploads.
Middleware handles the file upload: If the request includes a file upload, the middleware handles the file according to the fileUploader configuration. This configuration uses the diskStorage._handleFile
method from Multer
(via a standard configuration), which means the file will be stored on disk. The destination property specifies the directory where the file should be stored.
Middleware applies file upload limits: The limits property is used to control the size and number of files that can be uploaded. In your case, you're limiting it to one file with a maximum size specified by config.maxFileSize.
Middleware passes control to the next middleware function: After handling the file upload, the middleware calls next(), which passes control to the next middleware function in the stack. This could be another validation middleware, a route handler, or an error handling middleware.
It first checks if the request has been aborted before the file handling process starts. If so, it returns an error with error Upload aborted early
It then calls the getDestination
method to determine the directory where the uploaded file should be stored. If there's an error, it's passed to the callback function.
If the destination is successfully retrieved, it calls the getFilename
method to generate a unique filename for the uploaded file. If there's an error or the filename is blank, it's passed to the callback function.
If the filename is successfully generated, it creates a write stream to the destination path and pipes the file's read stream into it.
It sets up error and finish event handlers for the write stream. If there's an error writing the file, it deletes the file and passes the error to the callback function. If the write stream finishes successfully, it calls the callback function with details about the uploaded file.
It also sets up an 'aborted' event handler for the request. If the request is aborted (more on this below), it closes the write stream, deletes the file, and passes an error to the callback function. This also seems to be the most common error as reported above with error message Upload aborted
difficulty in reproducing the case in between () braces
Sorry for the confusion above, the issue i.e. error code 408 (at least) is not related to any bugs in colossus.
Doing a quick search on GPT
Default Request Timeout Express itself does not have a built-in request timeout mechanism for closing requests that take too long to complete. By default, Node.js's HTTP server (upon which Express is built) does not enforce a request timeout. This means that if not explicitly set, a request could theoretically hang indefinitely if the client doesn't close the connection and the server doesn't end the response. However, for Node.js versions 13.0.0 and later, the default timeout of the underlying HTTP server is set to 5 minutes (300 seconds). This is a change from earlier versions where there was no default timeout. It's important to note that this timeout doesn't kill the request but rather closes the socket if no activity is detected in that timeframe, which could affect long-polling requests or similar patterns.
I have tried uploading a small video that usually succeeds without throttling enabled. I have set the throttling to 3G slow
and the upload fails after 5.1 min
The error on the server side (colossus 4.0) is this:
024-03-07 17:19:48:1948 error: POST /api/v1/files: Error 200: Upload aborted
The experience is consistent: same error appears and timing appears on 3 different attempt (all with slightly different timeout intervals but all of them above 5 minutes)
One possible server-side solution to this is apparently to have a custom timeout settings along the following line:
// Set a timeout limit for all requests
app.use(timeout('600s')); // Adjust '10 mins' as needed
But I am not aware of possible implications for this. @zeeshanakram3 @mnaamani Feel free to provide feedback
I hade a private chat with @yasiryagi on Discord and he suggests the following approach:
408 is thrown by caddy because of the server closing the connection. So can we expose colossus port and test on colossus directly. It need to issue the metadata and make sure your channel is assign to the testing server only
I have tested it locally and it fails however on local gleev deployment it restarts from where it has left immediately without 408 response
Update On the second attempt it actually fails after about 5 mins with a 408 response (as you can seet on the time fields is 10 mins)
Setting Axios request timeout on atlas doesn't appear to work
Setting on storage-server/server.ts
const server = app.listen(port, () => logger.info(`Listening on http://localhost:${port}`))
server.setTimeout(900000)
doesn't work either
So just did some investigation, NodeJS HTTP server indeed abort requests with default timeout of 300 secs.
Now the reason why server.setTimeout(900000)
did not work was because it will only happen if the idle request exceeded this timeout. If the request was not idle then server.requestTimeout
value would checked.
Fixed in #5094, great job on this investigation @ignazio-bovo and thanks for the fix @zeeshanakram3
This is investigation of upload failures we've been getting reports about lately.
From user reports, I have heard that they had uploads just abort mid-flight. Users have reported seeing CORS issues in the browser logs.
From my testing, I have had a case when the upload failed mid-flight and I got 408 response from the reverse proxy. This would also result in a CORS error as mentioned above.