Munter / netlify-plugin-hashfiles

Netlify build plugin to get optimal file caching with hashed file names and immutable cache headers
BSD 3-Clause "New" or "Revised" License
32 stars 2 forks source link

Process crashes with out-of-memory #54

Open ehmicky opened 4 years ago

ehmicky commented 4 years ago

A build using this plugin reported the following error during the onPostBuild of this plugin:

┌──────────────────────────────────────────────────────┐
│ 6. onPostBuild command from netlify-plugin-hashfiles │
└──────────────────────────────────────────────────────┘

 ✔ 0.001 secs: logEvents
 ⚠ WARN: inline JavaScript in public/index.html - Parse error in inline JavaScript in public/index.html
Unexpected token (1:31)
Including assets:
public/index.html
 ✔ 1.760 secs: loadAssets
 ⚠ WARN: ENOENT: no such file or directory, open 'public/sw.js'
<--- Last few GCs --->
[1833:0x2c00030] 709775 ms: Scavenge 1348.7 (1424.1) -> 1348.3 (1424.6) MB, 5.7 / 0.0 ms (average mu = 0.153, current mu = 0.035) allocation failure
[1833:0x2c00030] 711657 ms: Mark-sweep 1349.0 (1424.6) -> 1348.6 (1425.1) MB, 1880.9 / 0.0 ms (average mu = 0.082, current mu = 0.005) allocation failure scavenge might not succeed
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x10e468bdbe1d]
1: StubFrame [pc: 0x10e468bdd3a6]
Security context: 0x16dd6e71e6c1 <JSObject>
2: _createSourceMapForInlineScriptOrStylesheet [0x3c3123ca2cb9] [/opt/build/repo/node_modules/assetgraph/lib/assets/Html.js:~276] [pc=0x10e46a11ed95](this=0x34b18c4a3341 <EventEmitter map = 0x2eae284a2c29>,element=0x09cece5691c9 <Object map = 0x125d72795e41>)
3: findOutgoingRelationsInParseTree [0x3c3123ca2cf1] [...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0x8fb090 node::Abort() [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
2: 0x8fb0dc [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
3: 0xb031ce v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
4: 0xb03404 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
5: 0xef7462 [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
6: 0xef7568 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
7: 0xf03642 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
8: 0xf03f74 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
9: 0xf06be1 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
10: 0xed0064 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
11: 0x11701ee v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/opt/buildhome/.nvm/versions/node/v10.20.1/bin/node]
12: 0x10e468bdbe1d

I unfortunately don't have more information. Do you know what might be happening?

Munter commented 4 years ago

Yeah, hashfiles is pretty naive in the way it loads in files: https://github.com/Munter/netlify-plugin-hashfiles/blob/master/lib/index.js#L25-L30

This basically loads everything you linked to, except js fetches. It all goes into memory. Sourcemaps, images, videos etc. This runs up the memory bill pretty quickly. In the link checker we do very specific work to unload assets as soon as possible to keep memory consumption low, which we can do because the check is pretty linear and we don't need much information to keep around to check existence of assets. This workload is different though.

In this type of work I need to explore the entire internal dependency graph, find leaf nodes, hash them, then traverse up the dependency graph to modify each asset referencing the hashed file, modify the href pointing to the hashed file, then repeat that work all the way up the tree. The naive approach is to just keep everything in memory. There are obvious limitations to this.

@papandreou do you think it's feasible to create a more custom graph traversal that will explore the graph in a different way to keep memory consumption low? I'm assuming we need to actually populate everything in order to find all relations, and we'll need to opt out of pretty much every convenience Assetgraph gives us for updating file paths if we unload parent assets as part of a custom traversal

papandreou commented 4 years ago

Hmm, yeah, for one thing I guess it isn't really necessary to keep videos, images, fonts etc. in memory, as they can't have outgoing relations. Should be enough to compute their hash (by accessing their md5Hex getter), then unload them.

To use less memory, you could even keep them unloaded, compute their hash via streaming, then set asset._md5Hex yourself:

const readStream = require('fs').createReadStream(
  require('urltools').fileUrlToFsPath(asset.url)
);
const hash = require('crypto').createHash('md5');
for await (const chunk of readStream) {
  hash.update(chunk);
}
asset._md5Hex = hash.digest('hex');
Munter commented 4 years ago

I was also considering the concept of assets that can't have any outgoing relations because none are plugged in. It's an interesting information to have for custom population and traversal. Might be worth adding that as a convenience getter to core, so it's queryable.

I've been considering how I could create a custom population for this plugin, and I think I might do that when I can find the time. It would mean discarding the current assetgraph transform, but I guess that's fine

papandreou commented 4 years ago

I think the current assetgraph transform would work fine with the trick I suggested above (maybe with a bit of tweaking wrt. unloaded assets), but if you want to optimize further, I think you're right. Custom population isn't that terrible nowadays 🌥️