Improvement/Question: Lower memory requirements for large fixtures folders of single host

In one of the projects I'm working on, I'm using Node Replay to allow for offline testing of API ( what better way to make it extremely fast ). In our setup, the API is accessible through a single host, i.e. api.domain.com. Having to test a lot of cases means that we run the scripts over many different scenarios.

If I'm understanding the current process of saving fixtures and then using them correctly, it should be as follows:

Capturing requests

A script runs ( first run to capture requests )
Node replay intercepts the requests that are made
On every capture
- Format the request
  - Generate a random file name
  - Save the request in that file

Using requests cache

A script runs ( second run, with captured requests, to do tests on the cache, offline )
Node replay intercepts a request
On every intercepted request
- Checks if the host has matchers in the memory cache
- If yes - return the matcher for the request and subsequently the request
- If not
  - Checks if the path for host exists
  - Loads ALL the files in the host fixtures folder
  - Parses them and saves them in the memory cache.

Now this is OK if we have host folders that are small. When the host folder grows to be several hundred megabytes in size ( ours is ~10+ GB ), loading all the requests from the folder causes NodeJS to run out of memory.

The proposed fix in this PR changes a tiny bit how saving and loading the requests happen.

What's new?

Instead of using a randomly generated file name, I updated the code to use a reproducible name, based on the request's hash ( see getFileUidFromRequest method ). This way each request is saved in a different file. When loading the requests, instead of loading all the saved requests for a host, it loads just the request that's currently being captured, again based on the request cache.

This way the library only loads the requests that are needed from the filesystem, instead of loading all that it can find in the host folder.

Drawbacks

Now there are some drawbacks that this method introduces. A couple of them:

Creates quite a few files on the filesystem. This can be a problem for a huge number of requests on a single host, as it may cause filesystem issues where there are too many files in a single folder.
It causes filesystem call on each request, which will add time on slow drives. It works well on an SSD, but I can see it causing issues on spinning disks.
I'm not sure how it affects the whole library, as I have limited overview on the code, as we're just using it in our specific case. Hence the Question part in the PR title :) .

Comments, questions, suggestions are very welcome, so we can find a ground where this issue is resolved for both small and huge caches. Thanks! :)

assaf / node-replay