dharmafly / noodle

A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.
https://noodle.dharmafly.com/
745 stars 69 forks source link

Analyse limitations of the cache and set defaults accordingly. #81

Closed premasagar closed 8 years ago

premasagar commented 11 years ago

This needs investigation. How long before the server crumbles?

AaronAcerboni commented 11 years ago

process.memoryUsage().heapTotal gives the total memory usage by the process. Perhaps a configuration option can be a max memory usage. Something can monitor the current usage and cull from the cache* if it nears the limit.

*either everything or until a lower memory usage is achieved

premasagar commented 11 years ago

That sounds like a good approach. Presumably, we can't determine total memory availability can we?

Premasagar Rose, Dharmafly http://dharmafly.com dharmafly.com / 07941 192398 premasagar.com twitter.com/premasagar L4RP.com asyncjs.com

AaronAcerboni commented 11 years ago

You can determine the amount of free memory with os.freemem()

http://nodejs.org/api/os.html#os_os_freemem

Note that the v8 engine is limited to 512mb for 32bit machines and 1gb for 64bit machines.

https://github.com/joyent/node/wiki/FAQ#what-is-the-memory-limit-on-a-node-process

Currently, by default v8 has a memory limit of 512mb on 32-bit systems, and 1gb on 64-bit systems. The limit can be raised by setting --max-old-space-size to a maximum of ~1gb (32-bit) and ~1.7gb (64-bit), but it is recommended that you split your single process into several workers if you are hitting memory limits.

premasagar commented 11 years ago

Sounds useful. So should the config take a max amount of memory in MB, or a % of available space?

Premasagar Rose, Dharmafly http://dharmafly.com dharmafly.com / 07941 192398 premasagar.com twitter.com/premasagar L4RP.com asyncjs.com

AaronAcerboni commented 11 years ago

For the sake of simplicity perhaps the only configuration in the config is something which says when to start culling from the cache. e.g.

"cacheMemoryLimit": "80%"

The algorithm which removes from the cache could pop off the oldest entry and recheck to see if the memory is at an acceptable level. If it is not then it can pop off the next oldest and recheck.

I say this because maybe developers would be best of using the command line argument --max_executable_size instead of setting a config option??

premasagar commented 11 years ago

Both a config option, and a command line alternative for that makes sense.

An assumption you're making, which needs to be checked, is that the available memory will change immediately after purging an item from the cache. You might need to explicitly call the garbage collector, to ensure that memory is released from the purged item.

Premasagar Rose, Dharmafly http://dharmafly.com dharmafly.com / 07941 192398 premasagar.com twitter.com/premasagar L4RP.com asyncjs.com

AaronAcerboni commented 11 years ago

Is it worth investigating moving the cache to the file system instead?

AaronAcerboni commented 11 years ago

Someone on stack overflow recommends http://memcached.org/

(http://stackoverflow.com/questions/15685988/how-do-i-manage-memory-in-node-js)

Memcache seems to run as a seperate server and you use a rest api. It seems nice and is a decent node wrapper for it but this would make deploying more difficult.