expressjs / serve-static

Serve static files
MIT License
1.38k stars 227 forks source link

optimization - static assets check if file exists, but use a static map of fs in memory instead? #107

Closed ORESoftware closed 4 years ago

ORESoftware commented 6 years ago

I opened the following issue in the Express repo, but it might be better if the issue was opened in this repo instead:

https://github.com/expressjs/express/issues/3651

I looked through the 'serve-static' main file and didn't see anything obvious that maps the fs, but I could have missed something.

dougwilson commented 6 years ago

Yea, that doesn't exist currently. A lot of people (including myself) use this module to serve a directory that changes at runtime. For example user uploads. I could see this being an option in some way. Do you want to start a pull request with your vision on it?

ORESoftware commented 6 years ago

yes would like to do that, it won't break anyone if we use an option that needs to be true.

are you better with a second options argument? like this:

app.use(express.static(path, opts));

so that might become:

app.use(express.static(path, {staticFS: true}));

or something like that. if the user declares their FS to be static, then 'serve-static' would create a map of their local FS and pull that into memory on server startup.

of course, this staticFS thing might be obviated if users are simply encouraged to not intersect their static assets path with their API routes.

if you can think of a better name than staticFS lmk

dougwilson commented 6 years ago

Yea, if it's optional then it should probably be in the same options argument we already take.

ORESoftware commented 6 years ago

Let me experiment with our server and see - I think a warning on intersecting paths would be a first good step if that works for you.

right now, straight up, all of our API routes were hitting our static assets middleware first, I didn't know that until like 18 months in lol.

wesleytodd commented 6 years ago

I think the added complexity is not worth the gain. Especially because you should always prefix your static directory with a prefix, app.use('/static'...), which completely avoids the issue.

That being said, I think an option to cache the files in memory would be a good addition. It could be implemented with a LRU limited by memory size in a pretty simple way.

ORESoftware commented 6 years ago

@wesleytodd that's a good point, caching files in memory might be a good idea

however, once you cache the files in memory, you are effectively doing what I am looking for to a certain extent. A flag would tell you that if it's not in the cache, it doesn't exist, so just call next() without trying to read the filesystem.

wesleytodd commented 6 years ago

Exactly, with that feature you would get what you need with the added benefit of it being a generally useful feature in other contexts.

ORESoftware commented 6 years ago

The thing that I am confused about ... I thought HTTP cached files automatically somehow. Like if the file didn't change, Express would know about it. Or the browser would cache certain files. I don't really understand how that works that well, and because of that, I can't say how useful/important a feature caching files using Express middleware would be.

ORESoftware commented 6 years ago

Anyway, so I am thinking of implementing caching using a flag (cache: true) or something. And use another flag to tell the middleware that if it's not in the cache then it doesn't exist (that might be onlyCache:true?).

However, if onlyCache: true is set, there would be a race condition while the middleware first pulls everything into the cache. I think flipping a single boolean could fix that but something to think about.

ORESoftware commented 6 years ago

@wesleytodd what I was saying was that etag might take care of some caching...but I don't see anything etag related in this package.

wesleytodd commented 6 years ago

etag is a client side thing, so the discussion around caching there is different. The idea of a server side cache would prevent multiple requests from different users from all reading the same file from the file system.

If I were you, I would make the cache option take a cache implementation so that people could override the behavior. But then if it was explicitly true it would use the built in one. Make sense?

For your "cache miss" scenario, I would not try to get complicated and just have the cache miss try to hit the file system on every request. If someone needs a complicated solution, which might be what you need, they could pass their own cache implementation which supports more robust options.

ORESoftware commented 6 years ago

Ok so the client generates the etag and the server looks at it - but I don't see anywhere in the serve-static code that handles etags.

wesleytodd commented 6 years ago

Other way around. The client will cache based on the etag which is generated by the server. I cannot remember if serve-static does this, it has been a while since I have looked. But either way that behavior is unrelated to the idea of doing a server side cache.

ORESoftware commented 6 years ago

right, I was reading this: https://evanhahn.com/express-dot-static-deep-dive/

etag's are generated by the server. In ExpressJS, looks like res.send() and other methods are what take care of sending a 304 or not.

I see what you are saying - the server still has to read the file from the filesystem to compare bytes before sending to the client, even if it's using etags. a serverside cache would allow the server to avoid re-reading the file from the filesystem each time.

ORESoftware commented 6 years ago

So the next question is - if we cache files as a string in memory, how do we still incorporate etag functionality? that's the only question I have right now.

ORESoftware commented 6 years ago

O/w what will happen is we do res.write('string representing file in memory'); , then etag functionality will not exist.

ORESoftware commented 6 years ago

Where does req.fresh come from? https://github.com/expressjs/express/blob/master/lib/response.js#L206

wesleytodd commented 6 years ago

https://github.com/expressjs/express/blob/b97faff6e2aa4d34d79485fe4331cb0eec13ad57/lib/request.js#L463-L480

ORESoftware commented 6 years ago

Got it - here is what I have right now, just a start. I am not sure if it's best to use different middleware or the serve-static middleware. Right now I am just testing it on our server at work.

https://github.com/ORESoftware/express.fs.cache

ORESoftware commented 6 years ago

@wesleytodd, do you know how serve-static handles this problem? https://github.com/nodejs/help/issues/1267#issuecomment-389077863

if we start streaming the file, how can we modify the pre-written headers after the fact if there is a later error reading/streaming the file.

wesleytodd commented 6 years ago

I am pretty sure it does not do anything toward that issue. I dont think you can send status in a trailer. Also I think most stream errors would result in a content length mismatch, so the error would be apparent.

ORESoftware commented 4 years ago

Any updates on this? I wrote an article on the subject just now:

https://medium.com/@the1mills/node-js-performance-the-most-common-piece-of-missing-middleware-for-servers-in-production-d4aca26b4da2

please add comments to the article if there are inaccuracies. I am finally working on this again.

dougwilson commented 4 years ago

Sorry I didn't participate in the conversation past the first part. The main sentiment here is that it is outside the scope of this module. There is also a lot of incorrect information listed above, for example this module does not read all the bytes from the disk to generate an etag. There is just too much going on in this thread that has diverged from the subject it was opened under to create a cohesive response.

Ultimately, per the subject of this issue, no, it is outside the scope of this module to.implement such a thing.

ORESoftware commented 4 years ago

well correct me if I am wrong, but if the file changes (checksum/sha1sum changes) then the browser will need a new copy of the file, (and the etag would change too, I would think).

dougwilson commented 4 years ago

The etag is using the same method nginx and apache uses, which is not based off the checksum of the file's contents.

ORESoftware commented 4 years ago

thanks, you're right about etags I assume. here is a description of the etag generation for apache https://stackoverflow.com/a/15848955/1223975

but, what I am talking about is simple and doesn't have to do with etags perse - say your app has 5 million clients. if you change a front-end/static file - all 5 million clients need a fresh copy of the file. If you load the file into memory, you will only have to read the file from disk once - when the server boots. savvy?

dougwilson commented 4 years ago

Yes, we understand your request and simply are of the opinion that is not a goal of this module to solve. Node.js itself should never be used for file IO if at all possible as they are not async operations and will overrun the thread pool. This module is kept simple for use cases well below your example. At that scale you should be using Apache, NGINX, or similar. If you do really want to use Node.js, the answer here is just write a module that does what you want to do...