expressjs / multer

Node.js middleware for handling `multipart/form-data`.
MIT License
11.52k stars 1.05k forks source link

Mitigating memory issues related to in memory storage #127

Closed LinusU closed 8 years ago

LinusU commented 9 years ago

I might actually have an idea that could mitigate the problem with storing files in memory.

The problem currently is that the program needs to allocate twice the space for each file. This is because chunks comes from the browser and emits a data-event. We then store an array of all of these chunks. When this is done, the array takes up roughly as much memory as the file.

But then we do Buffer.concat(array) to create a new buffer with all of the chunks, and this creates another buffer that takes up roughly the same size as the file again.

Now, I'm not 100% sure that this is the only problem. There might be places where buffers actually gets retained longer but this is at least on problem we have.

I believe that in memory store should only be used when you expect small files, small enough to be allocated straight in the memory. (That is, all the parts in line, as a Buffer is)

What we could do is allocate one Buffer for the entire body with the size of the initial Content-Length header. Then copying in to that one buffer, and whenever we have a part, just slice that one of.

Buffer#slice returns a new buffer that points to the same actual bytes, so this would avoid allocating all files twice.

This change would require quite much reengineering and I don't even think it's feasible to build it upon busboy. I think that this isn't high priority but it's worth to keep in mind.

src200 commented 9 years ago

Memory leak: when uploading a very large file ~30GB

LinusU commented 9 years ago

@csharathreddy What do you mean by memory leak? Are you seeing ~30GB still being allocated in memory long after the request is finished?

Or do you mean that the module is using ~60GB of memory while uploading a ~30GB file?

src200 commented 9 years ago

60GB..?my vm ram is about 8GB.It uses total ram while uploading a video

LinusU commented 9 years ago

This issue is regarding the memory storage. You can't upload 30GB of video to the memory on the server and not expect it to take up memory.

If you are having this problem with disk storage you'll have to open another issue.

src200 commented 9 years ago

Ok it takes 99% of memory but it should release after the request is finished right?

LinusU commented 9 years ago

If you are using the memory storage, it will use twice the amount of the file which you are uploading. There is no guarantees of when this will be released as that is entirely up to the garbage collector.

I would recommend you to not use the memory storage when dealing with files this large.

src200 commented 9 years ago

Does it possible uploading file without using RAM at all..

LinusU commented 9 years ago

Yes, using the file storage. It will then store them on the disk instead.

multer({ dest: './uploads' })

This will store uploads in the folder uploads.

src200 commented 9 years ago

Oh..Actually I am uploading files to disk only but while uploading ram usage is maximum such that the file in the queue for upload gets aborted.Actual scenario should be when a file upload starts, a request is send to server and writes chunks of data whatever we get in the request to ram then it writes to disk.The previous chunks in the ram should be flushed and it should accept new chunks and then write to the disk.So this is not happening in my case.

LinusU commented 8 years ago

I have yet to see any browser that sends a Content-Length header on the part so I don't think that my initial idea will be viable, closing...

brown2rl commented 8 years ago

is there a way to flush the memory buffer after the file has been saved elsewhere? Thanks!

LinusU commented 8 years ago

@brown2rl V8 uses a garbage collector so there isn't really any explicit way to tell it to free memory, it should happen automatically though...

llafuente commented 8 years ago

@brown2rl if you really are forced to do so... you can (not recommended, it's mostly for debugging purposes)

node --expose-gc xxx.js

This will provide

global.gc(); // run gc now...
brown2rl commented 8 years ago

Thanks guys! Any CollectionFS integrations on the pipeline?

LinusU commented 8 years ago

How would that look? Isn't that already a fully integrated solution, that has it's own storage handlers for S3, GridFS, etc?

brown2rl commented 8 years ago

Yeah, however that's a use case for large files. It would be interesting if multer were included for smaller files.

Sent from my Windows Phone


From: Linus Unnebäckmailto:notifications@github.com Sent: ‎4/‎27/‎2016 4:29 PM To: expressjs/multermailto:multer@noreply.github.com Cc: Brown, Robert (brown2rl)mailto:brown2rl@mail.uc.edu; Mentionmailto:mention@noreply.github.com Subject: Re: [expressjs/multer] Mitigating memory issues related to in memory storage (#127)

How would that look? Isn't that already a fully integrated solution, that has it's own storage handlers for S3, GridFS, etc?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/expressjs/multer/issues/127#issuecomment-215218000

brown2rl commented 8 years ago

Thanks!

-----Original Message----- From: "Cybrary" donotreply@cybrary.it Sent: ‎6/‎11/‎2016 10:09 AM To: "robert.brown@ieee.org" robert.brown@ieee.org Subject: Python, Web App Pentesting & Social Media: Essential Resources for Every Professional

Explore Cybrary: June 11
View this email in your browser

Your Saturday Cartoon! And a Cybersecurity fun-fact.

FACT:10 percent of social media users have received a cyber-threat. More than 600,000 accounts are compromised every day on Facebook alone.

Want more? See what other info Hexis Cyber Solutions has.

Coming Soon:

Upcoming S3SS10N Wednesday: Block Cyphers vs. Stream Cyphers by Kelly Handerhan Coming June 15th Miss last week's S3SS10N? View it here.

Web App Penetration Testing - Coming Tuesday, June 14th Cybrary's newest course by Raymond Evans will be available in just 3 days! Stay tuned for the release - you won't be disappointed.

Other Bits & Pieces:

Even professionals need practice. Review from our list of amazing tutorials and other resources.

Google Dorking Guide

"Done for You" - The Complete List of Kali Linux & Linux Training Videos on Cybrary

How Can I Become A H@cker?

Interested in in-depth Python tutorials? Primal Security has nearly 20 tutorials waiting to be read and watched! Check out part 1 of their series here, or find the one you want on their channel.

Copyright © 2016 Cybrary, All rights reserved. You're receiving this message because you became a member of Cybrary IT.

Our mailing address is:

Cybrary 7833 Walker Drive, Suite 510 Greenbelt, MD 20770

Add us to your address book

Want to change how you receive these emails? You can update your preferences or unsubscribe from this list